Diabetes is one of the fatal diseases that play a
vital role in the growth of other diseases in the human body. From a
clinical perspective, the most significant approach to mitigating the
effects of diabetes is early-stage control and management, with the aim
of a potential cure. However, lack of awareness and expensive clinical
tests are the primary reasons why clinical diagnosis and preventive
measures are neglected in lower-income countries like Bangladesh,
Pakistan, and India. From this perspective, this study aims to build an
automated machine learning (ML) model, which will predict diabetes at an
early stage using socio-demographic characteristics rather than
clinical attributes, due to the fact that clinical features are not
always accessible to all people from lower-income countries. To find the
best fit of the supervised ML classifier of the model, we applied six
classification algorithms and found that RF outperformed with an
accuracy of 99.36%. In addition, the most significant risk factors were
found based on the SHAP value by all the applied classifiers. This study
reveals that polyuria, polydipsia, and delayed healing are the most
significant risk factors for developing diabetes. The findings indicate
that the proposed model is highly capable of predicting diabetes in the
early stages.