| Kurzfassung
|
Requirements classification is a key task in software development, in which requirements are grouped into different categories, such as functional and non-functional requirements. It helps to prioritize requirements and improve quality of the software. To automate this process, many machine learning approaches have been studied in recent years. With the rise of LLMs, fine-tuning and prompting were evaluated, showing promising results. However, current research does not provide guidelines for when to use which technique. This thesis aims to address this gap by systematically comparing state-of-the-art fine-tuning and prompting approaches on different application scenarios. For this purpose, three realworld scenarios are examined, covering the varying availability of sample data. Based on that, the goal is to investigate how much training data is required to achieve similar or better results with fine-tuning as with prompting techniques. When evaluating only on unseen projects, zero-shot prompting often achieves a similar or even better performance than fine-tuning. If requirements from the same project are available for training, 20 to 40 requirements are enough to outperform prompting with fine-tuning. Depending on the task and fine-tuning approach, labeling 20 requirements manually improved the performance by up to 25 percentage points in the best case and often by at least 10 percentage points. The evaluated approaches showed that fine-tuning always achieves a better performance than prompting if enough training data was provided. This thesis provides guidelines for requirements classification in practice and serves as a foundation for further research regarding this topic.
|