Natural language processing in cancer treatment identification based on medical reports

Document Type : Research Paper

Authors

Department of Vocational Education-Nineveh, Ministry of Education, Mosul, Iraq.

Abstract
Cancer is still a major health concern, particularly in areas like Iraq with inadequate healthcare systems, where survival rates depend on early and precise diagnosis. Using clinical text data from radiology reports in Mosul, Iraq, this study examines the use of Natural Language Processing (NLP) and Machine Learning (ML) models for cancer diagnosis and classification. In order to categories cancer cases into benign, malignant, stable, progress, and improvement groups, three machine learning classifiers—Support Vector Machine (SVM), XGBoost, and LightGBM—were trained using TF-IDF features on a balanced dataset of 12,923 labelled radiological reports. XGBoost outperformed the other models and showed the highest accuracy (97.25%). This study examines the useful implications for improving diagnostic efficiency and demonstrates the efficacy of NLP-driven machine learning models in healthcare settings with limited resources. The results imply that these ML-NLP models can increase accuracy, decrease the need for manual diagnostic procedures, and possibly offer a scalable solution for healthcare systems with limited funding.

Keywords

Subjects


Articles in Press, Accepted Manuscript
Available Online from 01 June 2026

  • Receive Date 16 January 2026
  • Revise Date 31 May 2026
  • Accept Date 01 June 2026