Improving LLM-based Code Generation for Model Transformation Languages

Ausschreibung (Liste aller Ausschreibungen)
	Typ	Bachelorarbeit
	Aushang	BA LLM4MTL ad.pdf
	Betreuer	Wenden Sie sich bei Interesse oder Fragen bitte an: Bowen Jiang (E-Mail: bowen.jiang@kit.edu), Nathan Hagel (E-Mail: nathan.hagel@kit.edu)

Motivation

Model transformation languages, especially EMF-based domain-specific languages (DSLs) such as the Reactions Language and ATL for model-to-model transformations, face considerable challenges in benefiting from LLM-based code generation. Unlike general-purpose programming languages, these DSLs suffer from extreme data scarcity and possess highly specialized syntax and semantics that are underrepresented in the training data of most language models. As a result, LLM-generated code for these languages tends to be inconsistent and often falls short of practical usability. To address these challenges, this thesis investigates data augmentation, grammar-constrained decoding, and iterative refinement loops to systematically improve the reliability of LLM-based code generation for model transformations.

Tasks

Tasks would belong to the following listed items:

Extend the current dataset via data augmentation to mitigate low-resource limitations.
Design an automated refinement loop that detects syntactic and semantic inconsistencies, regenerates code, and measures performance improvements.
Integrate grammar constraints (e.g., GBNF grammars) during decoding to reduce syntax errors.
To evaluate your method.

Benefits

Engaging with cutting-edge (LLM) and industry-related (DSL, MDSD) technologies.
Close connection to the Convide research project. Obligations for the publication.
Excellent working environment and close mentorship.