LLM-assisted Generation and Testing of Model Transformation Language Code

Aus SDQ-Wiki
Ausschreibung (Liste aller Ausschreibungen)
Typ Bachelorarbeit oder Masterarbeit
Aushang BA LLM4MTL ad (4).pdf
Betreuer Wenden Sie sich bei Interesse oder Fragen bitte an:

Bowen Jiang (E-Mail: bowen.jiang@kit.edu), Weixing Zhang (E-Mail: weixing.zhang@kit.edu)

Motivation

Model transformation languages (MTLs), especially EMF-based domain-specific languages (DSLs) such as the Reactions Language and ATL for model-to-model transformations, face considerable challenges in benefiting from LLM-based code generation. Unlike general-purpose programming languages, MTLs suffer from extreme data scarcity and possess highly specialized syntax and semantics that are underrepresented in the training data of most language models. As a result, LLM-generated code for these languages tends to be inconsistent and often falls short of practical usability. This thesis investigates practical techniques to make LLM-based MTL code generation more reliable by combining dataset augmentation, grammar-constrained decoding, automatically test generation to evaluate the semantic the sematic and automated refinement guided by parsing and test feedback.

Tasks

Tasks would belong to the following listed items:

  • Expand the existing MTL dataset with data augmentation strategies (e.g., rule templates) and document their impact on data quality.
  • Implement an iterative generation--validation--repair loop that detects syntax errors and triggers targeted regeneration.
  • Integrate grammar constraints (e.g., GBNF grammars) during decoding to reduce syntax errors.
  • Design a workflow to automatically generate executable tests for generated transformations, and use them to measure semantic correctness.

Benefits

  • Engaging with cutting-edge (LLM) and industry-related (DSL, MDSD) technologies.
  • Close connection to the Convide research project. Possibility for the publication.
  • Excellent working environment and close mentorship.