Kurzfassung
|
Currently, commonly used plagiarism detection tools can only handle code from one language for a single run.
This thesis deals with two different sub-problems. Firstly, parsing and comparing the code of each occurring language in a single submission set separately (multi-language plagiarism detection) and, secondly, comparing submissions as a whole, despite containing code from multiple languages (cross-language plagiarism detection).
In this thesis, we propose supporting multi-language plagiarism detection by concatenating
the token lists. For cross-language plagiarism detection, we propose a set of language-agnostic tokens and rules for the order they should be extracted in, which have to be implemented for each supported language. In addition, a dynamic approach that allows more flexible matching of tokens is considered.
|