Accurate Recognition of Jujube Tree Trunks Based on Contrast Limited Adaptive Histogram Equalization Image Enhancement and Improved YOLOv8

Ling, Shunkang; Wang, Nianyi; Li, **gbin; Ding, Longpeng

doi:10.3390/f15040625

Open AccessArticle

Accurate Recognition of Jujube Tree Trunks Based on Contrast Limited Adaptive Histogram Equalization Image Enhancement and Improved YOLOv8

¹

College of Mechanical and Electrical Engineering, Shihezi University, Shihezi 832003, China

²

College of Software Engineering, **’an Jiaotong University, **’an 710049, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Forests 2024, 15(4), 625; https://doi.org/10.3390/f15040625

Submission received: 21 February 2024 / Revised: 20 March 2024 / Accepted: 27 March 2024 / Published: 29 March 2024

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate recognition of tree trunks is a prerequisite for precision orchard yield estimation. Facing the practical problems of complex orchard environment and large data flow, the existing object detection schemes suffer from key issues such as poor data quality, low timeliness and accuracy, and weak generalization ability. In this paper, an improved YOLOv8 is designed on the basis of data flow screening and enhancement for lightweight jujube tree trunk accurate detection. Firstly, the key frame extraction algorithm was proposed and utilized to efficiently screen the effective data. Secondly, the CLAHE image data enhancement method was proposed and used to enhance the data quality. Finally, the backbone of the YOLOv8 model was replaced with a GhostNetv2 structure for lightweight transformation, also introducing the improved CA_H attention mechanism. Extensive comparison and ablation results show that the average precision of the quality-enhanced dataset over that of the original dataset increases from 81.2% to 90.1%, and the YOLOv8s-GhostNetv2-CA_H model proposed in this paper reduces the model size by 19.5% compared to that of the YOLOv8s base model, with precision increasing by 2.4% to 92.3%, recall increasing by 1.4%, [email protected] increasing by 1.8%, and FPS being 17.1% faster.

Keywords:

orchard trunk extraction; image enhancement; attention mechanism; keyframe extraction

1. Introduction

In recent years, technological advances in the fields of machine vision, data mining, image processing, and computer communication have greatly promoted the development of many applications and fields such as intelligent agriculture [1,2]. Agricultural object detection and image acquisition is one of the important technical means to realize agricultural intelligence and precision agriculture [3,4,5]. The forest and fruit industry plays an important role in agriculture, which is not only vital to economic development but also contributes to ecological protection and the rural economy. The development and management of the jujube garden, as the main forest fruit industry in ** model is used to remap the image into a form free of aberrations since the fisheye lens that captures the image inevitably produces aberrations in physical imaging, which distort the position and shape of the object in space, causing the image to appear convex and distorted, which can lead to failure in object detection [28]. In polar coordinate map**, the center point of the image is the origin in the coordinate system. A line extending outward from the center point intersects a circle, and the individual arcs can be divided into small blocks, each corresponding to an uncorrected pixel point. By calculating the distance from the uncorrected pixel point to the center point, the pixel value at the correct position can be calculated, and thus correction can be achieved.

The initial data of the jujube tree aerial survey is a panoramic image, and the image pre-processing process is as follows: Firstly, the panoramic image is divided into two 180° wide-angle fisheye images according to the direction perpendicular to the rows of jujube trees; then, the polar coordinate map** is used to correct the distortion of the two segments of the image, and the deformed and stretched parts of the surrounding area are cropped. Finally, the image scale is set to 4:3 and the field of view is adjusted so that the height of the screen can completely accommodate the standardized jujube tree, and the front and side view image data of jujube tree are obtained, and the whole data acquisition process is shown in Figure 3.

2.2. Design of Automatic Key Frame Extraction Algorithm

Object detection technology cannot be separated from much high-quality dataset training; in order to efficiently realize the construction of a dataset in the field, the efficient and accurate extraction of the required key frame information from large-scale video sequences under limited computational resources is an urgent problem to be solved. There is unequal spacing of the jujube trees in the jujube tree orchard, and, at the same time, the UAV cannot realize the complete fixed speed during the image acquisition process, so it cannot use the conventional extraction strategies, such as the Fixed Frame Rate, Time Interval method, etc. [29]. It is necessary to use the object detection model as the core of automatic key frame extraction, and the extraction of the dataset can be constructed to continue to train the object detection model with high accuracy so as to realize a benign closed loop. The specific program of the automatic extraction algorithm for the key frames of jujube tree front and side views designed in this paper is as follows:

Initial environment configuration: build the algorithm development environment based on the Pytorch11.0 framework; then, import the toolkits numpy, tqdm, and supervision, etc., in order to realize the functions of data analysis, image processing, and data visualization, etc., of which the supervision toolkit is used as a keyframe that identifies the jujube tree reaching the centerline of the field of view and implements the cross-line extraction function.
Install the object detection model: the first object detection model needs to utilize the traditional training method, i.e., by manually obtaining a certain amount of jujube tree trunk pictures, labeling, and training an object detection model to have initial trunk detection capability.
Frame-by-frame detection: Design the path of the video to be detected, and utilize the pre-trained object detection model to detect the trunk. Encapsulate the frame-by-frame detection process into the “Peocess_frame” function, and output the visualized and configured image.
Cross-line counting: run the frame-by-frame detection function on the video, use supervision to parse the prediction results, traverse all the objects on the screen, and draw the visualization effect of object detection, which is combined with the detection line in the center of the screen, to determine whether the object is over the line and to count the number of objects to be displayed in the visualization.
Key frame extraction: Firstly, when the trunk object is detected in the image data, use the bounding box function to obtain the position information of the trunk, and then calculate the coordinates of the center point of the area where the trunk is located. Secondly, set up the trigger conditions, and when the coordinates of the center of the trunk are detected to have passed through the vertical line of the screen, then intercept the front and side view of the jujube tree. The key frame image is also labeled with the corresponding counting number and the size of image is cropped in order to make the image complete retention of the corresponding jujube tree at the same time to prevent excessive interference with the information, with the size of the cropped image being 1:1, as shown in Figure 4.

The whole process can be carried out directly in the code, which realizes the automated extraction algorithm. It can be seen that the key frame images with small data volume are mined from the panoramic image data with huge data volume, which greatly improves the dataset’s construction efficiency and training quality.

2.3. Dataset Image Enhancement

Due to the high light intensity of the environment in the area where the jujube garden is located, the collected image data of the bright side and the dark side of the jujube trees have the problem of large differences in contrast, brightness, and texture detail information. Excessively low data quality will lead to the subsequent object detection model having difficulty in adequately learning the object features during the training process and thus will create difficulties for detection accuracy, so image data enhancement is an important aspect [30]. In this paper, according to the characteristics of the problems existing in the actual task, the Contrast Limited Adaptive Histogram Equalization (CLAHE) method is used to enhance the image data, and its core idea is to equalize the local histogram of the image and limit the change of the contrast so as to enhance the contrast of the image and protect the details of the image information [31]. The specific operation steps are as follows:

Image division: First divide the image into small, non-overlap** rectangular regions; the size of these sub-regions is usually 8 × 8, 16 × 16, etc. The larger the number of pixels, the more obvious the enhancement effect is, but more information about the corresponding image details is lost. In OpenCV, the default tile size is 8 × 8.
Local histogram equalization: Convert the RGB color space to grayscale HSV space, which is more suitable for brightness and contrast processing, for each small block; then, calculate its grayscale histogram, calculate the map** function with this histogram, and apply this function to each region. And further calculate the cumulative distribution function (CDF) of the histogram.
Contrast limitation: In order to prevent over-enhancement (resulting in noise being amplified) caused by too many values of certain pixels, the frequency of pixels exceeding a predetermined threshold T (contrast limiting parameter) in the original block histogram Figure 5a is “truncated” and the “truncated” portion is evenly distributed among other pixels to obtain the modified histogram, as shown in Figure 5b, where A denotes the pixels equally distributed in each gray level and M denotes the gray value. The principle process is shown in Figure 5.
Pixel map**: using the map** relationship between the image pixels and the transformation function of the gray level of the partitioned region, an interpolation operation is applied to solve the gray level value of the corresponding pixel in order to eliminate the “blocky” image according to the number of neighboring points; the change function is 4, so bilinear interpolation is carried out between the partitioned sub-regions.
Interpolation Smoothing: Since images are divided into multiple small sub-regions for processing, the direct application of histogram equalization may produce significant boundary effects between adjacent sub-regions [32]. To solve this problem, we use CLAHE with bilinear interpolation to smoothen the transition between neighboring subregions to ensure the continuity and smoothness of the image.
Merging results: all the processed sub-regions are recombined into a complete image, the processed image is converted back to the RGB color space to complete the image data enhancement process, and finally the effect after enhancement by the CLAHE method is shown in Figure 6.

Dataset	Precision (%)			[email protected] (%)
Dataset	Dark Side	Bright Side	Average	Dark Side	Bright Side	Average
Original	83.9	78.5	81.2	83.9	79.1	81.5
Enhanced	91.3	88.9	90.1	91.3	89.1	90.2

Model	P (%)	R (%)	FPS	[email protected] (%)	Model Size (M)
YOLOv8s	90.1	88.7	153.5	90.2	21.5
YOLOv8s + GhostNetv2	87.6	85.8	186.3	87.9	16.9
YOLOv8s + GhostNetv2 + CA_H	92.3	89.9	179.8	91.8	17.3

Model	P (%)	R (%)	FPS	[email protected] (%)	Model Size (M)
Faster R-CNN	81.9	85.1	8	80.7	121.4
YOLOv5s	89.3	89.1	137.7	88.9	14.5
YOLOv8s	90.1	88.7	153.5	90.2	21.5
YOLOv8s-GhostNetv2	87.6	85.8	186.3	87.9	16.9
YOLOv8s-GhostNetv2-CA_H	92.3	89.9	179.8	91.8	17.3

Article Menu

Accurate Recognition of Jujube Tree Trunks Based on Contrast Limited Adaptive Histogram Equalization Image Enhancement and Improved YOLOv8

Abstract

1. Introduction

2.2. Design of Automatic Key Frame Extraction Algorithm

2.3. Dataset Image Enhancement

3. Methods

3.1. YOLOv8 Algorithm Structure

3.2. YOLOv8 Improvement of Backbone Network GhostNetv2

3.3. YOLOv8 Improvement of the CA_H Attention Mechanism

4. Experiment Results with Relevant Analysis

4.1. Experimental Settings

4.2. Qualitative Evaluation

4.3. Data Enhancement Comparison Test

Discussion

4.5. Comparative Experiments with Classical Algorithms

Discussion

5. Conclusions and Outlook

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI