Automated Classification of Software Engineering Papers along Content Facets

Aus SDQ-Institutsseminar
Version vom 8. Februar 2022, 19:41 Uhr von Angelika Kaplan (Diskussion | Beiträge)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Vortragende(r) Kevin Haag
Vortragstyp Bachelorarbeit
Betreuer(in) Angelika Kaplan
Termin Fr 11. Februar 2022
Vortragssprache
Vortragsmodus online
Kurzfassung With existing search strategies, specific paper contents can only be searched indirectly. Keywords are used to describe the searched content as accurately as possible but many of the results are not related to what was searched for. A classification of these contents, if automated, could extend the search process and thereby allow to specify the searched content directly and enhance current state of scholarly communication.

In this thesis, we investigated the automatic classification of scientific papers in the Software Engineering domain. In doing so, a classification scheme of paper contents with regard to Research Object, Statement, and Evidence was consolidated. We then investigate in a comparative analysis the machine learning algorithms Naïve Bayes, Support Vector Machine, Multi-Layer Perceptron, Logistic Regression, Decision Tree, and BERT applied to the classification task.