Multi-Stage/Multi-Approach TLR

Ausschreibung (Liste aller Ausschreibungen)
	Typ	Bachelorarbeit oder Masterarbeit
	Aushang
	Betreuer	Wenden Sie sich bei Interesse oder Fragen bitte an: Jan Keim (E-Mail: jan.keim@kit.edu, Telefon: +49-721-608-45994)

Traceability links (TLs) between software artifacts (e.g., requirements, design, code, test cases) are essential for many software engineering tasks such as impact analysis, change management, and compliance verification. Traditional Information Retrieval (IR) techniques have been widely used for trace link recovery due to their ability to quickly generate candidate links based on textual similarity. However, their precision and recall can be limited.

On the other hand, Machine Learning (ML) approaches, particularly supervised models, have demonstrated the potential to improve traceability when trained on confirmed links. Yet, ML models often require high-quality training data and can suffer from generalization issues.

This thesis explores novel hybrid strategies to combine the strengths of both IR and ML to improve the precision and recall trade-off in trace link recovery.

The goal of this thesis is to investigate and evaluate strategies for combining IR and ML techniques for trace link recovery. Two main strategies can/will be explored:

1. Precision-First Hybridization:

Use IR to generate an initial set of trace links with high precision.
Train a machine learning model on the confirmed links from IR output.
Use the trained model to expand the trace link set, aiming to improve recall while maintaining acceptable precision.

2. Intersection & XOR Strategy:

Apply both IR and ML independently to generate trace links.
Combine the intersection of both approaches to construct a high-precision core.
Use the XOR – links proposed by only one approach – as candidates to be classified by a trained model or heuristic to boost recall.