Fine-tuning vs. Prompting: The Case of Requirements Classification
| Typ | Bachelorarbeit | |
|---|---|---|
| Aushang | ReqClass Fine-tuning vs Prompting.pdf | |
| Betreuer | Wenden Sie sich bei Interesse oder Fragen bitte an: Tobias Hey (E-Mail: hey@kit.edu, Telefon: +49-721-608-44765), Andreas Vogelsang |
Motivation
Throughout the development of a software system, requirements serve as the foundation for design, implementation, and validation. Classifying these requirements—into categories such as functional or non-functional, or into more specific subtypes—is a crucial step for prioritization, traceability, and quality assurance. Automating this classification has long been a goal in requirements engineering. The emergence of Large Language Models (LLMs) has created new opportunities for im- proving requirements classification. Both fine-tuning and prompting strategies have been explored for this task. Fine-tuning allows the model to learn domain-specific patterns but de- pends on the availability of labeled training data. On the other hand, prompt-based methods require little to no task-specific data and are easier to deploy, yet their classification perfor- mance often falls short of those with fine-tuning.
Task Description
The goal of this work is to investigate the trade-off decision between fine-tuning a Large Language Model (LLM) and prompting it for the task of requirements classification. How much training data is required to achieve similar or better results with fine-tuning as with prompting techniques? The focus of this work will be on an empirical evaluation using existing methods and datasets. By comparing different amounts of training data and different prompting techniques on established benchmark datasets, we aim to provide insights and thus potentially even guidelines into when to use which method.