Software Plagiarism Detection on Intermediate Representation

Aus SDQ-Institutsseminar
Vortragende(r) Niklas Heneka
Vortragstyp Bachelorarbeit
Betreuer(in) Timur Sağlam
Termin Fr 17. November 2023
Vortragssprache
Vortragsmodus in Präsenz
Kurzfassung Source code plagiarism is a widespread problem in computer science education. To counteract this, software plagiarism detectors can help identify plagiarized code. Most state-of-the-art plagiarism detectors are token-based. It is common to design and implement a new dedicated language module to support a new programming language. This process can be time-consuming, furthermore, it is unclear whether it is even necessary. In this thesis, we evaluate the necessity of dedicated language modules for Java and C/C++ and derive conclusions for designing new ones. To achieve this, we create a language module for the intermediate representation of LLVM. For the evaluation, we compare it to two existing dedicated language modules in JPlag. While our results show that dedicated language modules are better for plagiarism detection, language modules for intermediate representations show better resilience to obfuscation attacks.