Häufigkeitsbasierte Erhebung bemerkenswerter Übereinstimmungen bei Quelltext-Plagiatserkennung

Aus SDQ-Institutsseminar
Vortragende(r) Elisabeth Hermann
Vortragstyp Bachelorarbeit
Betreuer(in) Robin Maisch
Termin Fr 12. September 2025, 11:30 (Raum 010 (Gebäude 50.34))
Vortragssprache Deutsch
Vortragsmodus in Präsenz
Kurzfassung Determining whether programming submissions addressing the same task were created independently or copied from one another is challenging. This task can be made easier with the use of plagiarism detection programs. These programs compare the submissions and identify similarities in sections between two submissions. However, to date, they do not take into account whether an identical section appears in more than two submissions. We assume that if a similarity occurs in only a few submissions, the probability of plagiarism is increased, and vice versa. The frequency of matches is counted across all comparisons. We integrate this approach into the token-based plagiarism detector JPlag to see how different strategies for detecting and weighting the frequency distribution of matches can be used to better separate plagiarism from inconspicuous matches. The weighting is incorporated into the Similarity Score, which assesses the similarity between two submissions. The results show that this approach can improve plagiarism detection.