Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning
Abstract
:1. Introduction
1.1. Ruler Detection in Images
1.2. Synthetic Image Generation for Deep Learning
- Data availability and scalability. Traditional methods for collecting real-world images are both time-consuming and expensive. Moreover, the acquisition of high-quality labeled data often requires additional human labor. Synthetic images, on the other hand, can be generated in large quantities with automated labeling, offering a scalable solution for data collection [54].
- Controlled experimental conditions. Among the challenges in using real-world data are the inherent variability and noise, which can introduce confounding factors into experiments. Synthetic images can be generated under controlled conditions, allowing for greater experimental rigor [55].
- Ethical and privacy concerns. Real-world images, especially those involving human subjects, often come with ethical and privacy concerns, including the need for informed consent. Synthetic images, being artificial constructs, bypass these issues, allowing for broader applicability in research [56,57].
1.3. The Novelty and Contributions of This Work
- Proposing a method to generate a dataset consisting of synthetic ruler image samples of high fidelity;
- Publicly providing the synthetic ruler image training dataset generated according to the presented method;
- Publicly providing the real ruler image training and test datasets;
- Presenting ruler detection benchmark results using six different CNN architectures;
- Experimentally showing that the created synthetic ruler image dataset is effective, sufficient, and superior to the real image dataset for the development of ruler detectors.
2. Materials and Methods
2.1. Generation of Synthetic Ruler Image Dataset
2.1.1. Requirements for Synthetic Ruler Images
- Variability and diversity: The dataset should include rulers of various lengths, colors, materials (e.g., plastic, metal, wood), and units of measurement (e.g., inches, centimeters). It should also cover a wide range of backgrounds, lighting conditions, and occlusions to mimic real-world scenarios:
- Ruler appearance: Generating rulers of different shapes, colors, materials. Ruler appearance can be simulated by randomly selecting width, height, color, transparency, and gradation marking type and placement from the predefined range of parameter values;
- Perspective and orientation: The dataset should represent rulers from multiple angles and perspectives, including top-down views and side views with varying degrees of perspective distortion. This can be achieved by randomly applying geometric transformations. Transformations (rotation, scaling, perspective distortions) can be introduced during dataset construction and at the time of training as data augmentation procedures;
- Contextual variety: Rulers should be placed in a variety of relevant contexts, such as on different surfaces, alongside common objects of measurement, and within complex scenes, to train the model for practical applications. Contexts can be simulated by using natural images as backgrounds;
- Occlusions: Including images where the ruler is partially obscured or interacting with other objects to simulate realistic use cases where the ruler is not always fully visible. Occlusions can be simulated during the training phase as a data augmentation procedure.
- Balanced image quality and resolution: High-resolution images are necessary to ensure that the measurement markings on the rulers are clear and discernible. On the other hand, unnecessarily large images would waste memory; moreover, image preprocessing for passing to neural networks would consume more time. The chosen input size of the neural network should be considered. Therefore, the resolution of images should be tuned while kee** in mind the maximum input size of the neural network and by leaving enough freedom for various augmentation transformations that enlarge the image regions.
- Realism: While synthetic images need to be diverse and comprehensive, they also need to be realistic enough not to introduce a domain gap when the model is applied to real-world images. The visual evaluation and performance of a ruler detector that is trained on a synthetic dataset provide feedback on realism.
- Consistency in synthetic generation parameters: Ensuring consistent application of synthetic image generation parameters such as lighting, textures, and noise models across the dataset to avoid biases in the training process. Controlled parameter sampling allows for the implementation of this requirement.
- Dataset size and balance: The size of the dataset must be large enough to train deep learning models effectively and should be balanced in terms of the variety of ruler types and scenarios presented.
- Legal considerations: The dataset creation process must adhere to intellectual property laws. It is especially relevant when selecting real background images for the dataset generation process; e.g., the COCO dataset [82] is licensed under a Creative Commons Attribution 4.0 license.
2.1.2. Method for Generating Synthetic Ruler Image Dataset
- Background preparation. The real image from the COCO dataset was randomly cropped and resized to px image size. The crop box was defined by randomly shifting the corner coordinates of the original image () by a maximum of 20% of the width/height of the image.
- Definition of the ruler area. The location of the ruler was designated in the pixel background area and centered. The length of the ruler, which is the longer dimension, was randomly chosen to be between 70% and 95% of the background width, while the width of the ruler, which is the shorter dimension, was randomly selected to be between 4% and 20% of the background height.
- Setup of the appearance of the rule. A set of random parameters that determined the visual attributes of the ruler was generated. The color of the ruler was defined in the HSV color space, where a triplet of values within the ranges of , , and was drawn randomly. The transparency level was established by selecting a "face alpha" value from the interval . The curvature at the corners of the ruler was set by choosing a value for the curvature parameter of the rectangle from the range . The areas for measurement markings, or gradations, were specified to be between 85% and 100% of the ruler’s length, with the width set at 30% to 45% for double-sided rulers or 40% to 60% for single-sided rulers. The decision to use a single-sided or double-sided ruler was made at random. The positions of the gradation areas were also randomly adjusted, with a shift of % from the edge of the ruler. The length of the ruler scale was chosen to fall within the range of measurement units. Subdivisions of the ruler scale, indicating tick positions, were selected from sets of , , or . For rulers with three-level subdivisions, the tick lengths were set to % of the width of the gradation area. For those with two-level subdivisions, tick lengths were determined to be %, %, or % of the width of the gradation area.
- Adding geometric distortions. The perspective transformation was simulated by randomly selecting ruler corner shifts and fitting the geometric transformation to those corner coordinate shifts. Each corner coordinate of the ruler was randomly shifted up/down by up to 20% of the ruler width and right/left by up to 10% of the ruler length. In addition, one of the ends of the ruler was widened by a value corresponding to the length up to the initial width of the ruler.
2.2. Synthetic Data Evaluation
- Training MobileNetV2 for 200 epochs using transfer learning or “from scratch”;
- Training MobileNetV2 for 200 epochs using transfer learning or “from scratch” with 5-fold cross-validation.
2.3. Data Collection
2.4. Software Used
- MATLAB programming and numeric computing platform (version R2022a; The Mathworks Inc., Natick, MA, USA) for the synthetic dataset creation workflow;
- Python (version 3.10.12) (https://www.python.org) (accessed on 21 August 2023) [88], an interpreted, high-level, general-purpose programming language. Used for machine learning applications and for data analysis and visualization;
- TensorFlow with Keras (version 2.10.1) (https://www.tensorflow.org (accessed on 21 August 2023)) [89], an open-source platform for machine learning. Used for the online data augmentation stage of the synthetic dataset creation workflow and for the training/testing of deep learning models;
- Albumentations (version 1.3.1) (https://albumentations.ai (accessed on 21 August 2023)) [90], a Python library for fast and flexible image augmentation. Used for the image augmentations during deep learning model training;
- OpenCV (version 4.8.0) (https://opencv.org/ (accessed on 21 August 2023)) [91], an open-source computer vision library. Used for image input/output and manipulations.
3. Results and Discussion
4. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ML | Machine learning |
DL | Deep learning |
AI | Artificial intelligence |
ANN | Artificial neural network |
CNN | Convolutional neural network |
DNN | Deep neural network |
GAN | Generative adversarial network |
DM | Diffusion model |
VAE | Variational autoencoder |
LR | Learning rate |
MSE | Mean squared error |
BBox | Bounding box |
Appendix A
References
- Deng, J.; Xuan, X.; Wang, W.; Li, Z.; Yao, H.; Wang, Z. A review of research on object detection based on deep learning. Proc. J. Phys. Conf. Ser. Iop Publ. 2020, 1684, 012028. [Google Scholar] [CrossRef]
- Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef] [PubMed]
- Tamulionis, M.; Sledevič, T.; Abromavičius, V.; Kurpytė-Lipnickė, D.; Navakauskas, D.; Serackis, A.; Matuzevičius, D. Finding the Least Motion-Blurred Image by Reusing Early Features of Object Detection Network. Appl. Sci. 2023, 13, 1264. [Google Scholar] [CrossRef]
- Pathak, A.R.; Pandey, M.; Rautaray, S. Application of deep learning for object detection. Procedia Comput. Sci. 2018, 132, 1706–1717. [Google Scholar] [CrossRef]
- Plonis, D.; Katkevičius, A.; Mališauskas, V.; Serackis, A.; Matuzevičius, D. Investigation of New Algorithms for Estimation of Losses in Microwave Devices Based on a Waveguide or a Meander Line. Acta Phys. Pol. A 2016, 129, 414–424. [Google Scholar] [CrossRef]
- Žuraulis, V.; Matuzevičius, D.; Serackis, A. A method for automatic image rectification and stitching for vehicle yaw marks trajectory estimation. Promet-Traffic Transp. 2016, 28, 23–30. [Google Scholar] [CrossRef]
- ** constraint. J. Digit. Imaging 2015, 28, 474–480. [Google Scholar] [CrossRef] [PubMed]
- Gooßen, A.; Schlüter, M.; Hensel, M.; Pralow, T.; Grigat, R.R. Ruler-based automatic stitching of spatially overlap** radiographs. In Proceedings of the Bildverarbeitung für die Medizin 2008: Algorithmen—Systeme—Anwendungen Proceedings des Workshops vom 6. bis 8. April 2008 in Berlin; Springer: Berlin/Heidelberg, Germany, 2008; pp. 192–196. [Google Scholar]
- Jaworski, N.; Farmaha, I.; Marikutsa, U.; Farmaha, T.; Savchyn, V. Implementation features of wounds visual comparison subsystem. In Proceedings of the 2018 XIV-th International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), Lviv, Ukraine, 18–22 April 2018; pp. 114–117. [Google Scholar]
- Gertsovich, I.; Nilsson, M.; Bartunek, J.; Claesson, I. Automatic estimation of a scale resolution in forensic images. Forensic Sci. Int. 2018, 283, 58–71. [Google Scholar] [CrossRef]
- Bhalerao, A.; Reynolds, G. Ruler detection for autoscaling forensic images. Int. J. Digit. Crime Forensics (IJDCF) 2014, 6, 9–27. [Google Scholar] [CrossRef]
- Tian, F.; Zhao, Y.; Che, X.; Zhao, Y.; **: A new image synthesis for construction sign detection in autonomous vehicles. Sensors 2022, 22, 3494. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Doll’a r, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. ar**v 2014, ar**v:1405.0312. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ar**v 2017, ar**v:1704.04861. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: tensorflow.org (accessed on 21 August 2023).
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
- Bradski, G. The OpenCV Library. Dr. Dobb’S J. Softw. Tools 2000, 120, 122–125. [Google Scholar]
- Bento, N.; Rebelo, J.; Barandas, M.; Carreiro, A.V.; Campagner, A.; Cabitza, F.; Gamboa, H. Comparing handcrafted features and deep neural representations for domain generalization in human activity recognition. Sensors 2022, 22, 7324. [Google Scholar] [CrossRef] [PubMed]
- El-Gayar, M.; Soliman, H.; Meky, N. A comparative study of image low level feature extraction algorithms. Egypt. Inform. J. 2013, 14, 175–181. [Google Scholar] [CrossRef]
- Alshazly, H.; Linse, C.; Barth, E.; Martinetz, T. Handcrafted versus CNN features for ear recognition. Symmetry 2019, 11, 1493. [Google Scholar] [CrossRef]
- Tsalera, E.; Papadakis, A.; Samarakou, M.; Voyiatzis, I. Feature extraction with handcrafted methods and convolutional neural networks for facial emotion recognition. Appl. Sci. 2022, 12, 8455. [Google Scholar] [CrossRef]
- Hamdi, M.; Senan, E.M.; Jadhav, M.E.; Olayah, F.; Awaji, B.; Alalayah, K.M. Hybrid Models Based on Fusion Features of a CNN and Handcrafted Features for Accurate Histopathological Image Analysis for Diagnosing Malignant Lymphomas. Diagnostics 2023, 13, 2258. [Google Scholar] [CrossRef]
Backbone | Parameters 1 | Depth 2 | Size (MB) 3 | Time (ms) CPU 4 | Time (ms) GPU 4 |
---|---|---|---|---|---|
MobileNetV2 (0.5) 5 | 706,224 | 52 | 3.06 | 233.6 | 15.5 |
MobileNetV2 (0.75) 5 | 1,382,064 | 52 | 5.64 | 363.2 | 19.2 |
MobileNet (0.5) 5 | 829,536 | 27 | 3.36 | 178.1 | 12.1 |
NASNetMobile | 4,269,716 | 196 | 17.9 | 585.4 | 44.1 |
ResNet50 | 23,587,712 | 53 | 90.3 | 1171 | 44.3 |
EfficientNetB0 | 4,049,571 | 81 | 15.9 | 729.6 | 32.5 |
Dataset | No. of Samples | Resolution(s) |
---|---|---|
Synthetic-train | 5000 | |
Real-train | 1901 | |
Real-test | 810 |
Synthetic-Train | Real-Train | Mixed (All) | ||||
---|---|---|---|---|---|---|
Experimental Scenarios | Mean | 95% CI | Mean | 95% CI | Mean | 95% CI |
MobileNetV2 (0.5), TL, E100 | 0.77 | 0.76–0.79 | 0.66 | 0.62–0.70 | 0.82 | 0.81–0.84 |
MobileNetV2 (0.75), TL, E100 | 0.77 | 0.76–0.78 | 0.74 | 0.70–0.78 | 0.84 | 0.83–0.86 |
MobileNet, TL, E100 | 0.83 | 0.81–0.84 | 0.68 | 0.67–0.70 | 0.86 | 0.84–0.87 |
NASNetMobile, TL, E100 | 0.82 | 0.80–0.83 | 0.67 | 0.63–0.72 | 0.78 | 0.76–0.80 |
ResNet50, TL, E100 | 0.81 | 0.80–0.82 | 0.61 | 0.57–0.65 | 0.84 | 0.81–0.86 |
EfficientNetB0, TL, E100 | 0.83 | 0.81–0.85 | 0.73 | 0.71–0.75 | 0.85 | 0.84–0.87 |
MobileNetV2 (0.5), TL, E200 | 0.84 | 0.83–0.84 | 0.72 | 0.68–0.75 | 0.86 | 0.85–0.88 |
MobileNetV2 (0.5), FS, E200 | 0.82 | 0.81–0.83 | 0.64 | 0.62–0.66 | 0.80 | 0.79–0.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Matuzevičius, D. Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning. Electronics 2023, 12, 4924. https://doi.org/10.3390/electronics12244924
Matuzevičius D. Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning. Electronics. 2023; 12(24):4924. https://doi.org/10.3390/electronics12244924
Chicago/Turabian StyleMatuzevičius, Dalius. 2023. "Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning" Electronics 12, no. 24: 4924. https://doi.org/10.3390/electronics12244924