LLMs for Diabetes Prediction

Benchmarking large language models against classical ML classifiers in a safety-critical clinical setting.

Published as From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction? in IEEE Xplore, this study examines the reliability and interpretability of generative models for decision support.

What it does

  • Benchmarks LLM-based inference against traditional machine-learning classifiers on the Pima Indians Diabetes Dataset.
  • Analyses model consistency and failure conditions, evaluating how trustworthy generative models are in a safety-critical context.
  • Contributes empirical evidence to the broader question of trustworthy AI in decision-support systems.

Tech stack: Python · Large Language Models · scikit-learn · model benchmarking