Study on Risk Prediction of Comorbidity of Major Chronic Diseases in Rural Elderly in China Based on Multi-label Deep Learning

Authors

  • Fen Fu Department of Information Science, Faculty of Humanities and Social Sciences, Khon Kaen University, Thailand
  • Kittiya Suthiprapa สาขาวิชาสารสนเทศศาสตร์ คณะมนุษยศาสตร์และสังคมศาสตร์ มหาวิทยาลัยขอนแก่น ประเทศไทย
  • Kanyarat Kwiecien

DOI:

https://doi.org/10.14456/jiskku.2026.4

Keywords:

Deep learning, Multi-label learning, Chronic disease risk prediction, Rural older adults, CHARLS

Abstract

Purpose: To develop and evaluate a multi-label deep learning framework for simultaneously predicting five major chronic diseases.

Methodology: Using CHARLS data supplemented with rural health examination records (n=38,569, aged ≥60), a fully connected multi-label deep neural network was trained using weighted binary cross-entropy and disease-specific threshold optimization. Five-fold cross-validation and an independent test set (15%) were used for evaluation.

Findings: The model achieved a macro AUC-ROC of 0.8587 and a macro F1-score of 0.6676 on the independent test set, outperforming logistic regression and XGBoost while achieving competitive performance compared to random forest. SHAP analysis identified systolic blood pressure, BMI, and fasting glucose as the top predictors.

Applications of this study: The proposed framework demonstrates the feasibility for transforming population-level survey data into actionable multimorbidity risk tools for resource-constrained rural primary care.

Downloads

Download data is not yet available.

References

Alaa, A. M., Bolton, T., Di Angelantonio, E., Rudd, J. H. F., & van der Schaar, M. (2019). Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS One, 14(5), Article e0213653. https://doi.org/10.1371/journal.pone.021365

Chen, X., Xie, K., Li, Y., Hu, D., Chen, Y., & Chen, J. (2024). The challenge of home and community older adult care in China: A survey of the capability of primary care physicians in providing geriatric healthcare services. Frontiers in Public Health, 12, Article 1464718. https://doi.org/10.3389/fpubh.2024.1464718

Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Doctor AI: Predicting clinical events via recurrent neural networks. JMLR Workshop and Conference Proceedings, 56, 301–318.

Collins, G. S., Moons, K. G. M., Dhiman, P., Riley, R. D., Beam, A. L., Van Calster, B., et al. (2024). TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ, 385, Article e078378. https://doi.org/10.1136/bmj-2023-078378

Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. G. M. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMJ, 350, Article g7594. https://doi.org/10.1136/bmj.g7594

Dai, L., Shen, Y., Zhang, K., et al. (2024). A deep learning system for predicting time to progression of diabetic retinopathy. Nature Medicine, 30, 584–594. https://doi.org/10.1038/s41591-023-02702-z

Dong, Z., Li, P., Jiang, Y., Wang, Z., Fu, S., Che, H., Liu, M., Zhao, X., Liu, C., Zhao, C., Zhong, Q., Rao, C., Wang, S., Liu, S., Hu, D., Wang, D., Gao, J., Guo, K., Liu, X., Zhu, E., & He, K. (2025). Integrative multi-omics and routine blood analysis using deep learning: Cost-effective early prediction of chronic disease risks. Advanced Science, 12(22), Article e2412775. https://doi.org/10.1002/advs.202412775

El-Hasnony, I. M., Elzeki, O. M., Alshehri, A., & Salem, H. (2022). Multi-label active learning-based machine learning model for heart disease prediction. Sensors (Basel), 22(3), Article 1184. https://doi.org/10.3390/s22031184

Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056

Gulshan, V., Peng, L., Coram, M., et al. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 316(22), 2402–2410. https://doi.org/10.1001/jama.2016.1721

He, J., Rasmy, L., Zhi, D., & Tao, C. (2025). Advancing pancreatic cancer prediction with a next visit token prediction head on top of Med-BERT. Cancers (Basel), 17(3), Article 516. https://doi.org/10.3390/cancers17030516

Jianpeng, L. (2022). Spatio-temporal evolution of the coordinated development of healthcare resources and utilization in China: Based on a hierarchical analysis framework. Scientia Geographica Sinica, 42(2), 284–292. https://doi.org/10.13249/j.cnki.sgs.2022.02.010

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

Lei, X., Sun, X., Strauss, J., & Zhao, Y. (2014). Depressive symptoms and SES among the mid-aged and elderly in China: Evidence from the CHARLS national baseline. Social Science & Medicine, 120, 224–232. https://doi.org/10.1016/j.socscimed.2014.09.028

Liang, X. Z., Chai, J. L., Li, G. Z., Li, W., Zhang, B. C., Zhou, Z. Q., & Li, G. (2025). A fall risk prediction model based on the CHARLS database for older individuals in China. BMC Geriatrics, 25, 170. https://doi.org/10.1186/s12877-025-05814-y

Liang, X. Z., Chai, J. L., Li, G. Z., Li, W., Zhang, B. C., Zhou, Z. Q., & Li, G. (2025). A fall risk prediction model based on the CHARLS database for older individuals in China. BMC Geriatrics, 25(1), Article 170. https://doi.org/10.1186/s12877-025-05814-y

Ministry of Civil Affairs, National Development and Reform Commission, & Ministry of Finance. (2025). Guidance on accelerating the development of rural elderly care services. https://www.gov.cn/zhengce/zhengceku/202406/content_6957138.htm

Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep Patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports, 6, Article 26094. https://doi.org/10.1038/srep26094

National Bureau of Statistics of China. (2021). Communiqué of the Seventh National Population Census (No. 5). https://www.stats.gov.cn/sj/tjgb/rkpcgb/qgrkpcgb/202302/t20230206_1902005.html

National Bureau of Statistics of China. (2025). Statistical communiqué of the People’s Republic of China on the 2024 national economic and social development. https://www.stats.gov.cn/sj/zxfb/202502/t20250228_1958817.html

Nie, S., Zhou, S., Cabrera, C., Chen, S., Jia, M., Zhang, S., Su, L., Gao, Q., Tangri, N., & Hou, F. F. (2025). Characteristics of patients with undiagnosed stage 3 chronic kidney disease: Results from an observational study (REVEAL-CKD) in China. The Lancet Regional Health – Western Pacific, 54, Article 101275. https://doi.org/10.1016/j.lanwpc.2024.101275

Rajkomar, A., Oren, E., Chen, K., et al. (2018). Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine, 1, Article 18. https://doi.org/10.1038/s41746-018-0029-1

Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.

Tsoumakas, G., Katakis, I., & Vlahavas, I. (2009). Mining multi-label data. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 667–685). Springer. https://doi.org/10.1007/978-0-387-09823-4_34

United Nations Department of Economic and Social Affairs (UNDESA). (2022). World population prospects. https://population.un.org/wpp/

Wang, R., Yan, Z., Liang, Y., Tan, E. C., Cai, C., Jiang, H., & Qiu, C. (2015). Prevalence and patterns of chronic disease pairs and multimorbidity among older Chinese adults in a rural area: A cross-sectional study. PLoS One, 10(9), Article e0138521. https://doi.org/10.1371/journal.pone.0138521

Weng, S. F., Reps, J., Kai, J., Garibaldi, J. M., & Qureshi, N. (2017). Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One, 12(4), Article e0174944. https://doi.org/10.1371/journal.pone.0174944

World Health Organization (WHO). (2025). Noncommunicable diseases: Key facts. https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases

Zhang, Y. (2025, January 17). China's elderly population reaches 310 million in 2024, experts say aging process enters its fastest period. Economic Observer. https://www.eeo.com.cn/2025/ 0117/707153.shtml

Zhang, M., Wang, L., Wu, J., Huang, Z., Zhao, Z., Zhang, X., Li, C., Zhou, M., & Wang, L. (2022). Data resource profile: China Chronic Disease and Risk Factor Surveillance (CCDRFS). International Journal of Epidemiology, 51(2), e1–e8. https://doi.org/10.1093/ije/dyab255

Zhao, Y., Hu, Y., Smith, J. P., Strauss, J., & Yang, G. (2014). Cohort profile: The China Health and Retirement Longitudinal Study (CHARLS). International Journal of Epidemiology, 43(1), 61–68. https://doi.org/10.1093/ije/dys203

Zisser, M., & Aran, D. (2024). Transformer-based time-to-event prediction for chronic kidney disease deterioration. Journal of the American Medical Informatics Association, 31(4), 980–990. https://doi.org/10.1093/jamia/ocae025

Downloads

Published

2026-03-30

How to Cite

Fu, F., Suthiprapa, K., & Kwiecien, K. . (2026). Study on Risk Prediction of Comorbidity of Major Chronic Diseases in Rural Elderly in China Based on Multi-label Deep Learning. Journal of Information Science Research and Practice, 44(1), 49–77. https://doi.org/10.14456/jiskku.2026.4

Issue

Section

Research Article