Data-Preparation for Machine-Learning Based Static Code Analysis

Aus SDQ-Institutsseminar
Version vom 11. März 2022, 09:51 Uhr von Robert Heinrich (Diskussion | Beiträge)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Vortragende(r) Felix Griesau
Vortragstyp Masterarbeit
Betreuer(in) Robert Heinrich
Termin [[Institutsseminar/2022-04-01|
 VeranstaltungsdatumVeranstaltungsraum
Institutsseminar/2022-04-01Fr 1. April 2022, 11:04MS Teams
]]
Vortragssprache
Vortragsmodus online
Kurzfassung Static Code Analysis (SCA) has become an integral part of modern software development, especially since the rise of automation in the form of CI/CD. It is an ongoing question of how machine learning can best help improve SCA's state and thus facilitate maintainable, correct, and secure software. However, machine learning needs a solid foundation to learn on. This thesis proposes an approach to build that foundation by mining data on software issues from real-world code. We show how we used that concept to analyze over 4000 software packages and generate over two million issue samples. Additionally, we propose a method for refining this data and apply it to an existing machine learning SCA approach.