1. Introduction
In recent years, the rampant spread of pathogenic microorganisms has posed significant challenges to public health [
1]. According to the data released by the World Health Organization (WHO), deaths from infectious diseases account for 19% of the total global mortality, with approximately 13 million children succumbing to infectious diseases each year [
2,
3]. The experiences of large medical institutions in dealing with infectious diseases and major epidemics have told us that early detection and diagnosis are the keys to effectively treating infections and controlling epidemics. It is crucial to detect pathogenic bacteria more rapidly and accurately in the early stage [
4,
5].
However, the current bacterial detection techniques have disadvantages such as long bacterial culture periods and inadequate detection throughput, making it difficult to meet the clinical demand for rapid detection [
6], especially for sterile body fluids with low bacterial content (such as blood, pleural fluid, cerebrospinal fluid, etc.). These samples require processes like bacterial enrichment through blood culture, positive-culturing onto blood plates, and isolation identification, which collectively results in low overall testing efficiency [
7,
8]. For the most time-consuming bacterial culture process (usually taking 1–2 days), a positive result is usually reported when the bacterial suspension concentration exceeds 10
5 CFU/mL (Colony Forming Units per milliliter) after blood bottle culture, while cases with concentrations less than 10
3 CFU/mL are usually presumed negative and do not require further testing. In clinical testing, we have observed that during these processes, specifically before or in the early stages of blood culture, if the infection status (positive or negative) directly from the samples can be accurately determined, it can significantly reduce the overall testing time. This means that positive samples can enter the antimicrobial susceptibility testing phase as soon as possible. For negative samples, there is no need for subsequent continuous cultivation to save consumables like culture plates and other medical supplies. In particular, there is a huge demand for urine sample testing in the emergency department, where the rapid issuance of bacterial test reports is of great significance [
9,
10]. In other words, for the test sample, we aim to directly detect whether it contains bacteria. If bacteria are detected, it is positive, and if there are no bacteria, it is negative. This process requires eliminating interference from impurities and other substances in the urine that might affect the bacterial target.
Therefore, we aim to introduce new technology in the determination of bacterial infection status of directly smeared urine samples to accelerate the detection process and reduce costs. Hyperspectral imaging (HSI) has developed from multispectral imaging, using imaging spectrometers to continuously image target objects in dozens or hundreds of spectral bands from ultraviolet to near-infrared (200–2500 nm) [
11,
12]. HSI has been widely applied in the field of remote sensing, such as terrain classification [
13], agricultural monitoring [
14], and food safety [
15]. The micro-hyperspectral imaging technology that has emerged in recent years is a combination of spectral analysis technology and microscopic imaging technology. Through the meticulous segmentation of spectral bands, higher-resolution, continuous, and narrow-band micro-hyperspectral images can be obtained, enabling a comprehensive analysis of qualitative, quantitative, and localization of microscopic tissue.
In the field of medical spectral research, the current spectral resolution of micro-hyperspectral imaging systems can reach 3 nm, with spatial resolution exceeding 0.5 μm [
16]. With the continuous improvement of various hardware parameters, it is possible to monitor pathophysiological characteristics and classify bacterial genera and species [
17]. Bacteria are mainly composed of proteins, nucleic acids, lipids, carbohydrates, and coenzymes, and different components have their own typical wavelength selectivity. The specific absorbers have strong absorption characteristics for certain specific wavelengths. The variations in the content of these substances can result in differential degrees of absorption, reflection, and scattering of light waves, ultimately manifesting as distinctive spectral features between bacterial genera, which provides a theoretical foundation for hyperspectral research of bacteria [
18,
19].
At present, hyperspectral research on bacteria mostly focuses on the classification and identification of several specific types of bacteria. For example, Matthew employed micro-hyperspectral imaging to detect Salmonella in chicken rinsate [
20]. The hyperspectral data of the Salmonella colony at 100× magnification within the wavelength range of 450–800 nm was obtained. A classification accuracy of 98.5% and specificity of 0.963 was achieved by using Quadratic Discriminant Analysis (QDA). Through a combination of micro-hyperspectral imaging and machine learning, Liu achieved a classification accuracy of 98.06% for two types of bacilli,
B. megaterium and
B. cereus, based on subtle differences in absorption peaks [
21]. Kang utilized frameworks such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) and made significant progress in the classification of foodborne bacteria [
22,
23]. The Fusion-net proposed stacked single-analysis frameworks and completed the synchronous processing of multiple features, improving the classification accuracy to 98.4%. However, most of his research is confined to a few specific types of foodborne bacteria. The narrow wavelength and low spectral resolution lack enough biological information for more complex multi-class detection of infectious bacteria [
24,
25]. Moreover, small-scale datasets make it difficult to accurately reflect the actual clinical distribution of bacteria [
26]. Tao designed an end-to-end deep learning network by combining micro-hyperspectral imaging systems to extract species-specific features at the bacterial level as bacterial differentiation fingerprints [
27]. A classification model for common bacteria was established based on a large-scale dataset, and the accuracy of classification for uncommon bacteria was achieved at 92% via transfer learning.
The above research indicates that bacterial detection technology based on micro-hyperspectral imaging has the capability to encode the biological characteristics of bacteria into datacubes via spectral and morphological information representation at the microscopic scale. By employing appropriate preprocessing methods and deep learning models, more detailed and intricate deep spectral features can be extracted. Compared with existing bacterial detection methods, micro-hyperspectral technology offers simplicity in operation and saves a significant amount of cultivation time. Moreover, it does not rely on traditional morphological observation, which can reduce the influence of human factors. Compared with similar studies on bacterial hyperspectral analysis, this study has made progress in the research object, research methods, and application of results. The research scope extends to a broader range of clinical infectious bacteria. The multi-scale buffered convolutional neural network has powerful capabilities of multi-dimensional feature extraction. This study has better scalability and higher application efficiency. Micro-hyperspectral technology is expected to become a reliable means to address the issue of rapid bacterial detection. However, as for the rapid determination of directly smeared bacterial infection status proposed in this study, there are currently no relevant research outcomes that have been observed.
Therefore, this study took a unique perspective on the bacterial infection status of directly smeared samples. Focusing on the common urine samples in clinical practice, this study discussed the features of hyperspectral bacteria data in directly smeared conditions, the differences in spectral features between positive and negative bacterial infection, deep learning models suitable for multi-scale features and rapid analysis of directly smeared data, as well as the temporal correlation between positive–negative determination and short-term cultivation. This study established a standard spectral database for common bacteria (
Escherichia,
Enterococcus,
Staphylococcus,
Candida, etc.) and impurities (crystal, casts, etc.) in urine samples to eliminate interference from impurities, and realized spectral matching with single-bacterium targets. Based on the hyperspectral data characteristics of directly smeared samples, a multi-scale buffered convolutional neural network, the Multi-BufferNet (abbreviated as MBNet), was established, which included three convolutional combination units to extract the spectral features of directly smeared data from different dimensions. Finally, a model was established by combining database matching and MBNet, called the joint determination model, which achieved rapid and accurate prediction of urine bacterial infection. To apply this technology to clinical outpatient practice, this study also combined the front-end rapid preparation method of directly smeared urine samples and the back-end automated analysis reporting software, exploring a more efficient and feasible determination solution for the whole process. This study, in conjunction with the genus identification step [
27], has formed a complete and rapid bacterial determination process.
2. Materials and Methods
2.1. Micro-Hyperspectral Imaging System
The data used in this study were all acquired by the micro-hyperspectral imaging system, MICROspecim. MICROspecim consists of a spectral imaging system, control system, and data processing system, as shown in
Figure 1. The spectral imaging system includes a front imaging mirror group, spectral acquisition component, imaging lens group, and area array detector. The control system includes a camera control unit and a motor control unit. The data processing system includes a data acquisition unit, a data analysis unit, and a database unit. A halogen lamp (400–2500 nm, 50 W) provides an active lighting source. The glass slide samples on the microscope stage are imaged on the area array detector through the spectral imaging system to complete two-dimensional information acquisition. Simultaneously, the control system operates the motor to complete another one-dimensional spatial information scanning. The control system and data processing system are uniformly integrated into computer software, responsible for datacube acquisition and post-processing. As a result, the hyperspectral data of directly smeared bacterial sample is obtained with a dimension of 226 (
λ) × 800 (
x) × 800 (
y), where 800 × 800, which represents the image size (physical area size of 0.12 mm × 0.12 mm), and 226 is the number of spectral channels from 400 nm to 1000 nm. At present, MICROspecim has been applied in clinical pathology-assisted diagnosis and rapid bacterial analysis research.
2.2. Experimental Samples Preparation
In this study, urine-smeared slides were used as experimental samples. After obtaining patients’ urine samples, a portion was taken as the experimental group for directly smeared urine sample preparation. Another portion served as the control group, and the urine sample was determined and labeled as either a positive or negative bacterial infection sample using the traditional culture test process. Moreover, the positive samples also needed to be labeled with information about the bacterial species they contained. This information served as the ground truth for training samples. The preparation process of directly smeared urine samples in the experimental group is shown in
Figure 2:
Take a clean glass slide, disinfect it with alcohol, and rinse it with distilled water. Then, bake it with an alcohol lamp to remove wax and cool it for later use.
Record detailed information on the urine sample and assign it a unique identifier. Pour the urine into an anticoagulant tube and balance it (so that the fluid volume in each tube is approximately the same).
Place the urine sample in a centrifuge and spin it at a speed of 3000 r/10 min.
Take out the centrifuged urine and use a clean sterile pipette to suck out the supernatant, leaving urine sediment at the bottom. Then, use a new pipette to suck out the urine sediment and mix it thoroughly. Smear the urine sediment on a slide and spread it quickly and evenly by a sterile loop.
Place the prepared slide in a biosafety cabinet until it is completely dry. Then, proceed with Gram-staining in the following order: stain with crystal violet, cover with iodine, decolorize with 95% ethanol, and counterstain with safranine. Finally, rinse the slide with water and air-dry it for later use. The Gram-staining process is necessary for two reasons. First, Gram-staining is an inherent part of the current testing process, which can highlight the morphological information of bacterial targets and facilitate doctors during observation and determination. It is beneficial for our technology to adhere to the existing bacterial testing process to the maximum extent possible. Second, the bacterial profile and detailed information of the unstained sample are not clear enough without Gram-staining. It is challenging for doctors to label specific bacteria or impurities.
Place the slide on the microscope stage and search for the field of view under a 10× objective. Convert the objective lens to a 100× objective lens and look for a field of view suspected to contain bacterial distribution. Then, perform a push scanning to capture hyperspectral images of directly smeared urine samples.
For the urine samples of the control group, after traditional culture, staining, biochemical molecular diagnosis, and mass spectrometry, the true value of bacterial infection status is determined. The experimental group is labeled with the corresponding sample identifiers.
2.3. Experimental Dataset
The urine samples in this study were all from the Clinical Laboratory of Tangdu Hospital. The experimental dataset was collected by MICROspecim, including 8124 sets of urine sample data, as shown in
Table 1. Among them, 2864 cases are negative (sterile) samples of bacterial infections and 5260 are positive samples. The largest sample size among positive samples is
E. coli (1442 cases), followed by
E. faecalis (720),
C. tropicalis (594),
K. pneumoniae (510),
C. albicans (460), P. mirabilis (365), S. epidermidis (322),
P. aeruginosa (315),
S. aureus (296), and
A. baumannii (236). Each set of urine samples includes 0 h (no cultivation, abbreviated as 0 h) and 3 h (short-term cultivation, abbreviated as 3 h) samples. The data size of each raw urine sample is 226 × 800 × 800. The first 26 and last 40 spectral bands of data need to be removed due to the high noise level, retaining only the visible and near-infrared spectral data from 450–900 nm. The spatial dimension size (800 × 800) of raw sample data is relatively large, which is not conducive to model construction and training. Therefore, in this study, a spatial stride of 200 was used to extract experimental data (160 × 400 × 400) from the raw data as an experimental sample set, as shown in
Figure 3. After three horizontal and vertical displacements, the total amount of experimental data was expanded to 9 × 8124 = 73,116. Finally, each sample was reviewed, problematic data were removed, and the truth values (positive or negative) of ultimate infection status were determined.
2.4. Database Standardization
The raw hyperspectral data of directly smeared bacteria is susceptible to factors such as system light source, optical components, and experimental environment, resulting in some random or systematic errors in the spectral and spatial dimensions. Therefore, when obtaining raw sample data, it is necessary to perform database standardization preprocessing to eliminate the impact of the system and external environment [
28]. The main steps include the following:
Maintain the light source intensity, focal length, and magnification constant, and collect hyperspectral image B1 of the blank sample from a blank area on the slide.
Calculate the correction coefficient of spectral dimension:
represents the coordinates of a pixel on the spectral image of a blank sample; represent the coordinates of its spatial two dimension and spectral dimension, respectively. N is the number of the spectral bands, and B is the average hyperspectral data of the blank sample in all spectral bands. is the correction coefficient of spectral dimension corresponding to the pixel at .
- 3.
Calculate the correction coefficient of spatial dimension:
P and Q are the number of horizontal and vertical pixels in the spectral image of a blank sample, respectively. is the correction coefficient of spatial dimension corresponding to the pixel at .
- 4.
Joint spatial and spectral dimension correction to obtain standardized hyperspectral data:
is the raw hyperspectral data, and is the standardized hyperspectral data obtained after is corrected.
2.5. Spectral Angle Matching
Spectral Angle Matching (SAM) is a method used for spectral data analysis and comparison, commonly employed in tasks such as classification, identification, and change detection of spectral data [
29,
30]. SAM performs sample matching and identification by comparing the spectral angles between a target sample and known samples. Although SAM is one of the most classic and traditional algorithms, it is also more reliable and offers higher flexibility and operability in model updates. Furthermore, it is insensitive to changes in brightness and lighting. Therefore, in this study, SAM was utilized to match the targets in the directly smeared urine samples with known samples in the database to determine the presence of bacterial targets in the directly smeared samples. The intuitive results of SAM make the results of positive–negative determination models easier to understand and interpret. The specific calculation formula for SAM is as follows:
Among them, nb is the number of spectral bands, t and r represent the reference spectrum and the test spectrum, respectively. The spectral cosine is used as a similarity measure, which measures the similarity between the reference spectrum and the test spectral vector based on the angle between them. A smaller angle indicates a higher similarity in spectral features, while a larger angle indicates greater dissimilarity.
2.6. MBNet
In the field of medical imaging, most end-to-end models based on three-dimensional (3D) convolutional networks are proposed for processing stereoscopic imaging modes such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) [
31]. These imaging modes have the characteristics of relatively simple semantic features and fixed organ structures, which are somewhat different from the high variability and complexity of micro-hyperspectral images. In micro-hyperspectral analysis, 3D convolution means performing convolution operations by sliding the convolution kernel in three directions: two spatial directions and one spectral direction [
32,
33]. The definition of 3D convolution is shown in Equation (6):
In this equation, represents the output 3D feature map, is the input 3D vector, and is the 3D convolution kernel. represent the coordinates of the three output directions, namely the positions of spatial dimension, spatial dimension row, and spectral dimension column. represent the sizes of the convolutional kernel in these three directions, and these three parameters collectively determine the receptive field size of that layer. The use of 3D convolution is more suitable for extracting features from datacubes, as it not only extracts spatial features but also spectral features.
Therefore, constructing a determination model based on 3D convolution is not a simple linear combination of 1D and 2D convolutional networks. Due to the inherent difficulty in obtaining medical samples, overfitting is prone to occur when directly applying deeper models. The deeper network models have more parameters, higher complexity, and are more challenging to train, which contradicts the limited sample size of the directly smeared urine sample in this study. A CNN network with relatively fewer layers and parameters may be more suitable for this study. Therefore, this study proposed a multi-scale convolutional neural network, the Multi-BufferNet (abbreviated as MBNet). Its model architecture is shown in
Figure 4.
The term “multi-scale” refers to using convolutional kernels of different sizes to process the same layer of feature maps, combining different convolutional kernels in a parallel manner, and merging the convolutional results. The use of different convolutional kernels is to introduce receptive fields of different sizes and extract features at various scales. To address the characteristics of rich spectral information and varying target sizes in directly smeared urine data, a convolutional combination unit consisting of three sets of convolutional kernels was designed. The convolutional combination unit stacks 3 × 1 × 1, 3 × 3 × 3, and 3 × 5 × 5 (spectral dimension λ × spatial dimension row × spatial dimension column) kernels together. The 3 × 1 × 1 kernel is dedicated to extracting spectral information, while the 3 × 3 × 3 and 3 × 5 × 5 kernels are used to capture spatial texture information at different scales. The feature maps obtained from the three sets of convolutional kernels are concatenated in the feature concatenation layer to produce the output feature maps. Subsequently, a combined buffer unit of the buffer layer and the downsampling layer was designed, with a downsampling stride of 2. The stride of the buffer layer is fixed at 1 to enhance representational power without reducing the feature map’s resolution. Convolutional kernels are all 3 × 3 × 3 in size to extract detailed information of datacubes. After the feature concatenation layer, four buffer units are sequentially connected, and finally, a fully connected layer and a SoftMax layer are passed to output the determination results.
2.7. Evaluation Metrics
This study employed Accuracy (ACC), Positive Predictive Value (PPV), and negative predictive value (NPV) as evaluation metrics. ACC is often used as a measure of classification performance in hyperspectral image analysis, while PPV and NPV are often applied in the medical field. PPV represents the proportion of true positive results among those that were determined as positive during diagnosis or testing. NPV represents the proportion of true negative results among those that were determined as negative. The specific calculation formulas for these metrics are as follows:
TP,
FP,
TN, and
FN represent the number of samples that are true positives (correctly predicted positive), false positives (incorrectly predicted positive), true negatives (correctly predicted negative), and false negatives (incorrectly predicted negative), respectively. The relationships between these metrics are shown in
Figure 5.