Software Plagiarism Detection on Intermediate Representation

Aus SDQ-Institutsseminar
Vortragende(r) Niklas Heneka
Vortragstyp Bachelorarbeit
Betreuer(in) Timur Sağlam
Termin [[Institutsseminar/2023-11-17-2|
 VeranstaltungsdatumVeranstaltungsraum
Institutsseminar/2023-11-17-2Fr 17. November 2023, 11:11Raum 237 (Gebäude 50.34)
]]
Vortragssprache
Vortragsmodus in Präsenz
Kurzfassung Source code plagiarism is a widespread problem in computer science education. To counteract this, software plagiarism detectors can help identify plagiarized code. Most state-of-the-art plagiarism detectors are token-based. It is common to design and implement a new dedicated language module to support a new programming language. This process can be time-consuming, furthermore, it is unclear whether it is even necessary. In this thesis, we evaluate the necessity of dedicated language modules for Java and C/C++ and derive conclusions for designing new ones. To achieve this, we create a language module for the intermediate representation of LLVM. For the evaluation, we compare it to two existing dedicated language modules in JPlag. While our results show that dedicated language modules are better for plagiarism detection, language modules for intermediate representations show better resilience to obfuscation attacks.