Benchmarking Coding Agents on Educational Programming Tasks

Aus SDQ-Institutsseminar
Vortragende(r) Saigent Zeqja
Vortragstyp Bachelorarbeit
Betreuer(in) Haoyu Liu
Termin Fr 17. April 2026, 11:30 (Raum 010 (Gebäude 50.34))
Vortragssprache Englisch
Vortragsmodus in Präsenz
Kurzfassung [[Kurzfassung::This thesis focuses on evaluating agentic large language models for solving programming

exercises in an educational context. The problem addressed in this work is how well such models can solve final programming exercises from scratch with hidden tests and externally controlled feedback. Existing benchmarks either focus on generating code within the scope of a single file [50, 30, 8] or on repository-level with existing codebases [25, 44], and therefore do not capture the structure and requirements of educational programming exercises. Moreover, many approaches rely on synthetic [60] or internally generated feedback [33], which do not reflect the behavior of real evaluation systems. To address this, a coding agent is implemented that communicates with an external system, namely Artemis, and uses a feedback loop to iteratively improve its solutions.]]