Data-Preparation for Machine-Learning Based Static Code Analysis

Vortragende(r)	Felix Griesau
Vortragstyp	Masterarbeit
Betreuer(in)	Robert Heinrich
Termin	Fr 1. April 2022, 11:30 (MS Teams)
Vortragssprache
Vortragsmodus	online
Kurzfassung	Static Code Analysis (SCA) has become an integral part of modern software development, especially since the rise of automation in the form of CI/CD. It is an ongoing question of how machine learning can best help improve SCA's state and thus facilitate maintainable, correct, and secure software. However, machine learning needs a solid foundation to learn on. This thesis proposes an approach to build that foundation by mining data on software issues from real-world code. We show how we used that concept to analyze over 4000 software packages and generate over two million issue samples. Additionally, we propose a method for refining this data and apply it to an existing machine learning SCA approach.