An automated pipeline that generates structured software requirements from app-store user feedback. We evaluate zero-shot, few-shot, and chain-of-thought prompting for their effect on requirement completeness, consistency, and conformance to human-written baselines, and categorise the failure modes that arise in LLM-generated requirements.
@inproceedings{sakib2026reviews,title={From Reviews to Requirements: Can LLMs Generate Human-Like User Stories?},author={Sakib, Shadman and others},booktitle={Proceedings of the 9th Workshop on Natural Language Processing for Requirements Engineering (NLP4RE), co-located with REFSQ},year={2026},url={https://arxiv.org/abs/2603.28163},}
We benchmark LLM-based inference against traditional machine-learning classifiers on the Pima Indians Diabetes Dataset, analysing the reliability, interpretability, and failure conditions of generative models in a safety-critical decision-support context.
@inproceedings{sakib2025diabetes,title={From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction?},author={Sakib, Shadman and others},booktitle={IEEE Xplore},year={2025},publisher={IEEE},url={https://ieeexplore.ieee.org/document/11171691},}