Application of Machine Learning for Early Detection and Risk Prediction of Diabetes
DOI:
https://doi.org/10.1366/bc7es605Abstract
Diabetes is an increasingly prevalent chronic condition, affecting populations across age groups, including youth. Accurate and early diagnosis is essential for timely intervention and effective disease management. This study investigates the use of machine learning (ML) techniques for early-stage diabetes prediction, utilizing a clinical dataset of 400 patients from Vietnam. The study begins with a comprehensive review of related literature, highlighting the effectiveness of AI-based screening methods for pre-diabetes and type 2 diabetes. It then details the research methodology, including data preprocessing, model selection and evaluation through cross-validation. Several ML models—Decision Tree, Logistic Regression, SVC, AdaBoost, Gradient Boosting, Random Forest and K-Nearest Neighbors—were compared for their predictive accuracy. The Random Forest Classifier demonstrated the highest performance with a mean accuracy of 0.911 and strong consistency across different folds. Additional testing on an independent dataset of 67 patients validated the model's robustness. The study also explores the clinical implications of AI-generated diabetes probability scores, emphasizing their potential to assist healthcare professionals in decision-making. The results confirm that machine learning algorithms can be valuable tools for non-invasive, early detection of diabetes and for quantifying patient-specific risks, particularly in resource-limited settings.



