LLM-assisted Generation and Testing of Model Transformation Language Code
| Typ | Bachelorarbeit oder Masterarbeit | |
|---|---|---|
| Aushang | BA LLM4MTL ad (4).pdf | |
| Betreuer | Wenden Sie sich bei Interesse oder Fragen bitte an: Bowen Jiang (E-Mail: bowen.jiang@kit.edu), Weixing Zhang (E-Mail: weixing.zhang@kit.edu) |
Motivation
Model transformation languages (MTLs), especially EMF-based domain-specific languages (DSLs) such as the Reactions Language and ATL for model-to-model transformations, face considerable challenges in benefiting from LLM-based code generation. Unlike general-purpose programming languages, MTLs suffer from extreme data scarcity and possess highly specialized syntax and semantics that are underrepresented in the training data of most language models. As a result, LLM-generated code for these languages tends to be inconsistent and often falls short of practical usability. This thesis investigates practical techniques to make LLM-based MTL code generation more reliable by combining dataset augmentation, grammar-constrained decoding, automatically test generation to evaluate the semantic the sematic and automated refinement guided by parsing and test feedback.
Tasks
Tasks would belong to the following listed items:
- Expand the existing MTL dataset with data augmentation strategies (e.g., rule templates) and document their impact on data quality.
- Implement an iterative generation--validation--repair loop that detects syntax errors and triggers targeted regeneration.
- Integrate grammar constraints (e.g., GBNF grammars) during decoding to reduce syntax errors.
- Design a workflow to automatically generate executable tests for generated transformations, and use them to measure semantic correctness.
Benefits
- Engaging with cutting-edge (LLM) and industry-related (DSL, MDSD) technologies.
- Close connection to the Convide research project. Possibility for the publication.
- Excellent working environment and close mentorship.