Fine-tuning vs. Prompting: The Case of Requirements Classification

Aus SDQ-Wiki
Ausschreibung (Liste aller Ausschreibungen)
Typ Bachelorarbeit
Aushang ReqClass Fine-tuning vs Prompting.pdf
Betreuer Wenden Sie sich bei Interesse oder Fragen bitte an:

Tobias Hey (E-Mail: hey@kit.edu, Telefon: +49-721-608-44765), Andreas Vogelsang

Motivation

Throughout the development of a software system, requirements serve as the foundation for design, implementation, and validation. Classifying these requirements—into categories such as functional or non-functional, or into more specific subtypes—is a crucial step for prioritization, traceability, and quality assurance. Automating this classification has long been a goal in requirements engineering. The emergence of Large Language Models (LLMs) has created new opportunities for im- proving requirements classification. Both fine-tuning and prompting strategies have been explored for this task. Fine-tuning allows the model to learn domain-specific patterns but de- pends on the availability of labeled training data. On the other hand, prompt-based methods require little to no task-specific data and are easier to deploy, yet their classification perfor- mance often falls short of those with fine-tuning.

Task Description

The goal of this work is to investigate the trade-off decision between fine-tuning a Large Language Model (LLM) and prompting it for the task of requirements classification. How much training data is required to achieve similar or better results with fine-tuning as with prompting techniques? The focus of this work will be on an empirical evaluation using existing methods and datasets. By comparing different amounts of training data and different prompting techniques on established benchmark datasets, we aim to provide insights and thus potentially even guidelines into when to use which method.