Incorporation of Text Content Similarity into Token-based Source Code Plagiarism Detection
| Vortragende(r) | Moritz Rimpf | |
|---|---|---|
| Vortragstyp | Bachelorarbeit | |
| Betreuer(in) | Robin Maisch | |
| Termin | Fr 26. September 2025, 11:30 (Raum 010 (Gebäude 50.34)) | |
| Vortragssprache | Deutsch | |
| Vortragsmodus | in Präsenz | |
| Kurzfassung | State-of-the-art source-code plagiarism systems often discard textual content in source code during plagiarism detection. However, recent developments have shown that possible cases of plagiarism often do contain similar or identical textual content, such as inline or documentation comments. These comments could be used to more easily find cases of plagiarism or further aid instructors during the manual review of suspicious submissions. Therefore, in this thesis, we enhance plagiarism detection engines by reintroducing textual content in the form of source code comments into the plagiarism detection process.
To process comments during plagiarism detection, we introduce a three-step comment processing pipeline that extracts and compares comments and merges the results with the plagiarism detection outcome. Furthermore, we compare three different matching algorithms for matching comments and examine the impact of natural language preprocessing on the comments within this pipeline. | |