Machine-learning algorithm to non-invasively detect diabetes and pre-diabetes from electrocardiogram
Early detection is of crucial importance for prevention of type 2 diabetes and pre-diabetes. Diagnosis of these conditions relies on the oral glucose tolerance test and haemoglobin A1c estimation which are invasive and challenging for large-scale screening. We aimed to combine the non-invasive nature of ECG with the power of machine learning to detect diabetes and pre-diabetes.
Data for this study come from Diabetes in Sindhi Families in Nagpur study of ethnically endogenous Sindhi population from central India. Final dataset included clinical data from 1262 individuals and 10 461 time-aligned heartbeats recorded digitally. The dataset was split into a training set, a validation set and independent test set (8892, 523 and 1046 beats, respectively). The ECG recordings were processed with median filtering, band-pass filtering and standard scaling. Minority oversampling was undertaken to balance the training dataset before initiation of training. Extreme gradient boosting (XGBoost) was used to train the classifier that used the signal-processed ECG as input and predicted the membership to ‘no diabetes’, pre-diabetes or type 2 diabetes classes (defined according to American Diabetes Association criteria).
Prevalence of type 2 diabetes and pre-diabetes was ~30% and ~14%, respectively. Training was smooth and quick (convergence achieved within 40 epochs). In the independent test set, the DiaBeats algorithm predicted the classes with 97.1% precision, 96.2% recall, 96.8% accuracy and 96.6% F1 score. The calibrated model had a low calibration error (0.06). The feature importance maps indicated that leads III, augmented Vector Left (aVL), V4, V5 and V6 were most contributory to the classification performance. The predictions matched the clinical expectations based on the biological mechanisms of cardiac involvement in diabetes.
Machine-learning-based DiaBeats algorithm using ECG signal data accurately predicted diabetes-related classes. This algorithm can help in early detection of diabetes and pre-diabetes after robust validation in external datasets.