Evidence-based Token Abstraction for Software Plagiarism Detection

Vortragende(r)

Vortragstyp

Bachelorarbeit

Betreuer(in)

Termin

[[Institutsseminar/2023-04-28|

	Veranstaltungsdatum	Veranstaltungsraum
Institutsseminar/2023-04-28	Fr 28. April 2023, 11:04	Raum 348 (Gebäude 50.34)

]]

Vortragssprache

Vortragsmodus

in Präsenz

Kurzfassung

Programming assignments for students are target of plagiarism. Especially for graded

assignments, instructors want to detect plagiarism among the students. For larger courses, however, manual inspection of all submissions is a resourceful task. For this purpose, there are numerous tools that can help detect plagiarism in submissions. Many well-known plagiarism detection tools are token-based detectors. In an abstraction step, they map source code to a list of tokens, and such lists are then compared with each other. While there is much research in the area of comparison algorithms, the mapping is often only considered superficially. In this work, we conduct two experiments that address the issue of token abstraction. For that, we design different token abstractions and explain their differences. We then evaluate these abstractions using multiple datasets. We show that different abstractions have pros and cons, and that a higher abstraction level does not necessarily perform better. These findings are useful when adding support for new programming languages and for improving existing plagiarism detection tools. Furthermore, the results can be helpful to choose abstractions tailored to specific requirements.