Automated Requirements Conformance Evaluation

A taxonomy and automated tooling for detecting structural conformance failures in LLM-generated requirements.

An extension of my thesis work that treats requirement generation as a conformance problem: how far do LLM-generated user stories deviate from standard requirement specifications, and can those deviations be detected automatically?

What it does

  • Systematically classifies structural conformance failures in LLM-generated user stories, building a taxonomy of deviation types from standard requirement formats.
  • Provides Python evaluation scripts that automate structural and semantic comparison between generated artefacts and reference baselines.
  • Connects generative-model consistency failures to the broader problem of detecting semantic divergence in evolving software artefacts.

Tech stack: Python · NLP · automated artefact validation