Improving LLM-based Code Generation for Model Transformation Languages
| Typ | Bachelorarbeit | |
|---|---|---|
| Aushang | BA LLM4MTL ad.pdf | |
| Betreuer | Wenden Sie sich bei Interesse oder Fragen bitte an: Bowen Jiang (E-Mail: bowen.jiang@kit.edu), Nathan Hagel (E-Mail: nathan.hagel@kit.edu) |
Motivation
Model transformation languages, especially EMF-based domain-specific languages (DSLs) such as the Reactions Language and ATL for model-to-model transformations, face considerable challenges in benefiting from LLM-based code generation. Unlike general-purpose programming languages, these DSLs suffer from extreme data scarcity and possess highly specialized syntax and semantics that are underrepresented in the training data of most language models. As a result, LLM-generated code for these languages tends to be inconsistent and often falls short of practical usability. To address these challenges, this thesis investigates data augmentation, grammar-constrained decoding, and iterative refinement loops to systematically improve the reliability of LLM-based code generation for model transformations.
Tasks
Tasks would belong to the following listed items:
- Extend the current dataset via data augmentation to mitigate low-resource limitations.
- Design an automated refinement loop that detects syntactic and semantic inconsistencies, regenerates code, and measures performance improvements.
- Integrate grammar constraints (e.g., GBNF grammars) during decoding to reduce syntax errors.
- To evaluate your method.
Benefits
- Engaging with cutting-edge (LLM) and industry-related (DSL, MDSD) technologies.
- Close connection to the Convide research project. Obligations for the publication.
- Excellent working environment and close mentorship.