An Explainable Artificial Intelligence Model Proposed for the Prediction of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and the Identification of Distinctive Metabolites
Abstract
:1. Introduction
2. Materials and Methods
2.1. ME/CFS Metabolomics Dataset
2.2. Experimental Setup and Proposed Framework
- The first step involves obtaining metabolomics data to be used in the experiments. Metabolomics data are based on results from a study of 26 healthy controls and 26 ME/CFS patients aged 22 to 72 years with similar BMI.
- In the second step, artificial intelligence-based random forest (RF) feature selection is applied to identify biomarker candidate metabolites and to eliminate the high dimensionality problem in omics data. Because the metabolomics data have a large number of feature dimensions, the performance scores of the predicted models may be lower. Therefore, the twenty most important metabolites contributing to improved performance scores in ME/CFS prediction were identified.
- In the third step, 80–20% split, 5-fold cross-validation (CV), and 1000 replicate bootstrap approaches were used to validate the prediction models to be generated using the selected biomarker candidate metabolites, and the results were compared.
- In the fourth step, Bayesian hyper-parameter optimization was used to determine the optimal parameters.
- In the fifth step, predictive models were built to diagnose ME/CFS patients. For this purpose, the Gaussian naive Bayes (GNB), gradient boosting classifier (GBC), logistic regression (LR), and random forest classifier (RFC) algorithms were constructed. The performance of the models was evaluated using the area under (AUC) receiver operating characteristic (ROC) curve, the Brier score, accuracy, precision, recall, and the F1 score. While the primary purpose of the methodology is biomarker discovery and diagnosis of ME/CFS, an important secondary purpose is to provide users with indicative probability scores. Therefore, we evaluated the quality of the probabilities with a calibration curve and by calculating the Brier score.
- Finally, XAI approaches SHAP and TreeMap were applied to the proposed model in order to provide transparency and interpretability to the model and to explain intuitively the decisions made by the model. With the help of SHAP and TreeMap, the rationale and process behind a particular decision made by the proposed model can be grasped.
2.3. Feature Selection
2.4. Validation Methods
2.5. The Bayesian Approach for Hyper-Parameter Optimization
2.6. Classification Models
2.7. Performance Evaluation and Model Calibration
Performance Evaluation
2.8. Model Calibration
2.9. XAI Approaches
3. Results
3.1. Feature Selection Results
3.2. Hyper-Parameters Optimization Results
3.3. The Model Performance Results
3.4. XAI Results
4. Discussion
5. Conclusions
6. Limitations and Future Works
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Toogood, P.L.; Clauw, D.J.; Phadke, S.; Hoffman, D. Myalgic Enceph./Chronic Fatigue Syndr. (ME/CFS): Where Will Drugs Come? Pharmacol. Res. 2021, 165, 105465. [Google Scholar] [CrossRef] [PubMed]
- Germain, A.; Barupal, D.K.; Levine, S.M.; Hanson, M.R. Comprehensive circulatory metabolomics in ME/CFS reveals disrupted metabolism of acyl lipids and steroids. Metabolites 2020, 10, 34. [Google Scholar] [CrossRef]
- Malato, J.; Graça, L.; Sepúlveda, N. Impact of imperfect diagnosis in ME/CFS association studies. medRxiv 2022, 13, 531. [Google Scholar] [CrossRef]
- Valdez, A.R.; Hancock, E.E.; Adebayo, S.; Kiernicki, D.J.; Proskauer, D.; Attewell, J.R.; Bateman, L.; DeMaria, A., Jr.; Lapp, C.W.; Rowe, P.C.; et al. Estimating prevalence, demographics, and costs of ME/CFS using large scale medical claims data and machine learning. Front. Pediatr. 2019, 6, 412. [Google Scholar] [CrossRef] [PubMed]
- Faro, M.; Sàez-Francás, N.; Castro-Marrero, J.; Aliste, L.; de Sevilla, T.F.; Alegre, J. Gender differences in chronic fatigue syndrome. Reumatol. Clínica 2016, 12, 72–77. [Google Scholar] [CrossRef] [PubMed]
- Marshall-Gradisnik, S.; Eaton-Fitch, N. Understanding myalgic encephalomyelitis. Science 2022, 377, 1150–1151. [Google Scholar] [CrossRef]
- Malkova, A.; Shoenfeld, Y. Autoimmune autonomic nervous system imbalance and conditions: Chronic fatigue syndrome, fibromyalgia, silicone breast implants, COVID and post-COVID syndrome, sick building syndrome, post-orthostatic tachycardia syndrome, autoimmune diseases and autoimmune/inflammatory syndrome induced by adjuvants. Autoimmun. Rev. 2022, 22, 103230. [Google Scholar] [PubMed]
- Dehhaghi, M.; Panahi, H.K.S.; Kavyani, B.; Heng, B.; Tan, V.; Braidy, N.; Guillemin, G.J. The role of kynurenine pathway and NAD+ metabolism in myalgic encephalomyelitis/chronic fatigue syndrome. Aging Dis. 2022, 13, 698. [Google Scholar] [CrossRef]
- Nunes, J.M.; Kell, D.B.; Pretorius, E. Cardiovascular and haematological pathology in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS): A role for viruses. Blood Rev. 2023, 60, 101075. [Google Scholar] [CrossRef] [PubMed]
- Hornig, M.; Montoya, J.G.; Klimas, N.G.; Levine, S.; Felsenstein, D.; Bateman, L.; Peterson, D.L.; Gottschalk, C.G.; Schultz, A.F.; Che, X.; et al. Distinct plasma immune signatures in ME/CFS are present early in the course of illness. Sci. Adv. 2015, 1, e1400121. [Google Scholar] [CrossRef]
- Shan, Z.Y.; Barnden, L.R.; Kwiatek, R.A.; Bhuta, S.; Hermens, D.F.; Lagopoulos, J. Neuroimaging characteristics of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS): A systematic review. J. Transl. Med. 2020, 18, 335. [Google Scholar] [CrossRef] [PubMed]
- Navaneetharaja, N.; Griffiths, V.; Wileman, T.; Carding, S.R. A role for the intestinal microbiota and virome in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS)? J. Clin. Med. 2016, 5, 55. [Google Scholar] [CrossRef] [PubMed]
- Maes, M.; Leunis, J.-C.; Geffard, M.; Berk, M. Evidence for the existence of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) with and without abdominal discomfort (irritable bowel) syndrome. Neuroendocr. Lett. 2014, 35, 445–453. [Google Scholar]
- Germain, A.; Giloteaux, L.; Moore, G.E.; Levine, S.M.; Chia, J.K.; Keller, B.A.; Stevens, J.; Franconi, C.J.; Mao, X.; Shungu, D.C.; et al. Plasma metabolomics reveals disrupted response and recovery following maximal exercise in myalgic encephalomyelitis/chronic fatigue syndrome. JCI Insight 2022, 7, e157621. [Google Scholar] [CrossRef]
- Yagin, F.H.; Cicek, İ.B.; Alkhateeb, A.; Yagin, B.; Colak, C.; Azzeh, M.; Akbulut, S. Explainable artificial intelligence model for identifying COVID-19 gene biomarkers. Comput. Biol. Med. 2023, 154, 106619. [Google Scholar] [CrossRef]
- Steyerberg, E.W.; Harrell, F.E.; Borsboom, G.J.J.M.; Eijkemans, M.J.C.; Vergouwe, Y.; Habbema, J.D.F. Internal Validation of Predictive Models: Efficiency of Some Procedures for Logistic Regression Analysis. J. Clin. Epidemiol. 2001, 54, 774–781. [Google Scholar] [CrossRef]
- Efron, B.; Tibshirani, R. Improvements on Cross-Validation: The 632+ Bootstrap Method. J. Am. Stat. Assoc. 1997, 92, 548–560. [Google Scholar] [CrossRef]
- Harrell, F.E.; Lee, K.L.; Mark, D.B. Tutorial In Biostatistics Multivariable Prognostic Models: Issues In Develo** Models, Evaluating Assumptions And Adequacy, And Measuring And Reducing Errors. Stat. Med. 1996, 15, 361–387. [Google Scholar] [CrossRef]
- Levman, J.; Ewenson, B.; Apaloo, J.; Berger, D.; Tyrrell, P.N. Error Consistency for Machine Learning Evaluation and Validation with Application to Biomedical Diagnostics. Diagnostics 2023, 13, 1315. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Liu, C.-A. Model averaging prediction by K-fold cross-validation. J. Econom. 2023, 235, 280–301. [Google Scholar] [CrossRef]
- Iba, K.; Shinozaki, T.; Maruo, K.; Noma, H. Re-Evaluation of the Comparative Effectiveness of Bootstrap-Based Optimism Correction Methods in the Development of Multivariable Clinical Prediction Models. BMC Med. Res. Methodol. 2021, 21, 9. [Google Scholar] [CrossRef] [PubMed]
- Diniz, M.A. Statistical methods for validation of predictive models. J. Nucl. Cardiol. 2022, 29, 3248–3255. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, Q.; Shen, W. Hyper-parameter optimization of multiple machine learning algorithms for molecular property prediction using hyperopt library. Chin. J. Chem. Eng. 2022, 52, 115–125. [Google Scholar] [CrossRef]
- Wu, J.; Chen, X.-Y.; Zhang, H.; ** resampling techniques on the example of map** complex gully systems. Remote Sens. 2021, 13, 2980. [Google Scholar] [CrossRef]
Technique | Optimized Parameter Value |
---|---|
GNB | var_smoothing = 1 × 10−9 |
GBC | n_estimators = 3, learning_rate = 1.0, max_depth = 1 |
LR | max_iter = 30, solver = ‘liblinear’ |
RFC | max_depth = 26, min_samples_leaf = 5, min_samples_split = 3, n_estimators= 12 |
Attained Performance Using All Input Features | Attained Performance Using Feature Selection | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Technique | A (%) | P (%) | R (%) | F1 (%) | B | AUC (%) | A (%) | P (%) | R (%) | F1 (%) | B | AUC (%) |
80–20% Split Validation | 80–20% Split Validation | |||||||||||
GNB | 73 | 72 | 73 | 72 | 0.27 | 67 | 73 | 72 | 73 | 72 | 0.27 | 67 |
GBC | 36 | 39 | 36 | 37 | 0.63 | 33 | 73 | 75 | 73 | 73 | 0.27 | 73 |
LR | 64 | 64 | 64 | 64 | 0.36 | 60 | 73 | 72 | 73 | 72 | 0.27 | 67 |
RFC | 45 | 56 | 45 | 44 | 0.54 | 51 | 82 | 86 | 82 | 80 | 0.18 | 75 |
Results with a 5-fold cross-validation | Results with a 5-fold cross-validation | |||||||||||
GNB | 52 | 36 | 94 | 62 | 0.26 | 59 | 82 | 77 | 92 | 84 | 0.15 | 91 |
GBC | 48 | 47 | 35 | 37 | 0.34 | 52 | 95 | 94 | 99 | 95 | 0.05 | 98 |
LR | 58 | 46 | 71 | 54 | 0.45 | 46 | 95 | 95 | 96 | 96 | 0.03 | 98 |
RFC | 56 | 68 | 38 | 56 | 0.28 | 64 | 97 | 96 | 97 | 98 | 0.04 | 99 |
Results with a 1000-repetition bootstrap | Results with a 1000-repetition bootstrap | |||||||||||
GNB | 63 | 70 | 63 | 60 | 0.36 | 63 | 83 | 84 | 83 | 83 | 0.17 | 91 |
GBC | 92 | 92 | 92 | 92 | 0.07 | 92 | 96 | 96 | 96 | 96 | 0.03 | 92 |
LR | 96 | 96 | 96 | 96 | 0.04 | 95 | 96 | 96 | 96 | 96 | 0.04 | 99 |
RFC | 90 | 90 | 90 | 90 | 0.09 | 90 | 98 | 98 | 98 | 98 | 0.01 | 99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yagin, F.H.; Alkhateeb, A.; Raza, A.; Samee, N.A.; Mahmoud, N.F.; Colak, C.; Yagin, B. An Explainable Artificial Intelligence Model Proposed for the Prediction of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and the Identification of Distinctive Metabolites. Diagnostics 2023, 13, 3495. https://doi.org/10.3390/diagnostics13233495
Yagin FH, Alkhateeb A, Raza A, Samee NA, Mahmoud NF, Colak C, Yagin B. An Explainable Artificial Intelligence Model Proposed for the Prediction of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and the Identification of Distinctive Metabolites. Diagnostics. 2023; 13(23):3495. https://doi.org/10.3390/diagnostics13233495
Chicago/Turabian StyleYagin, Fatma Hilal, Abedalrhman Alkhateeb, Ali Raza, Nagwan Abdel Samee, Noha F. Mahmoud, Cemil Colak, and Burak Yagin. 2023. "An Explainable Artificial Intelligence Model Proposed for the Prediction of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and the Identification of Distinctive Metabolites" Diagnostics 13, no. 23: 3495. https://doi.org/10.3390/diagnostics13233495