LLM-based Code Generation for Model Transformation Languages

Aus SDQ-Wiki
Ausschreibung (Liste aller Ausschreibungen)
Typ Bachelorarbeit oder Masterarbeit
Aushang LLM4DSML.pdf
Betreuer Wenden Sie sich bei Interesse oder Fragen bitte an:

Bowen Jiang (E-Mail: bowen.jiang@kit.edu), Nathan Hagel (E-Mail: nathan.hagel@kit.edu)

Motivation

Model transformation languages, especially EMF-based domain-specific languages (DSLs) such as the Reactions Language for model-to-model transformations and Acceleo for model-to-text transformations, face considerable challenges in benefiting from LLM-based code generation. Unlike general-purpose programming languages, these DSLs suffer from extreme data scarcity and possess highly specialized syntax and semantics that are underrepresented in the training data of most language models. As a result, LLM-generated code for these languages tends to be inconsistent and often falls short of practical usability. Given the widespread use of such DSLs in both industry and academia, it is crucial to systematically evaluate and improve LLM performance in the context of model transformation tasks.

Tasks

  • To evaluate the performance of LLM-based code generation.
  • To investigate the reasons for diverse performance outcomes. (eg., low-resource DSLs, limitation of LLM architectures in handling DSLs ) [Master only]
  • Evaluate the pipeline's performance.
  • To propose strategies and methods for enhancing the performance based on model adaptation techniques (eg., Fine-tuning), advanced prompting strategies, iterative refinements, token processing(eg., JSON format), Retrieval-Augmented Generation...To improve prompting for model transformations, design an integrated prompt strategy that accounts for necessary contents including grammar even if it is low resource.
  • To evaluate your method.

Benifits

  • Engaging with cutting-edge (LLM) and industry-related (DSL, MDSD) technologies.
  • Close connection to the Convide research project.
  • Excellent working environment and close mentorship.
  • Obligations for the publication of the results.