Kurzfassung
|
Programming assignments for students are target of plagiarism. Especially for graded
assignments, instructors want to detect plagiarism among the students. For larger courses,
however, manual inspection of all submissions is a resourceful task. For this purpose, there
are numerous tools that can help detect plagiarism in submissions. Many well-known
plagiarism detection tools are token-based detectors. In an abstraction step, they map
source code to a list of tokens, and such lists are then compared with each other.
While there is much research in the area of comparison algorithms, the mapping is
often only considered superficially. In this work, we conduct two experiments that address
the issue of token abstraction. For that, we design different token abstractions and explain
their differences. We then evaluate these abstractions using multiple datasets. We show
that different abstractions have pros and cons, and that a higher abstraction level does not
necessarily perform better. These findings are useful when adding support for new programming
languages and for improving existing plagiarism detection tools. Furthermore,
the results can be helpful to choose abstractions tailored to specific requirements.
|