Relevance-Driven Feature Engineering

Aus SDQ-Institutsseminar
Vortragende(r) Rosina Kazakova
Vortragstyp Masterarbeit
Betreuer(in) Edouard Fouché
Termin Fr 8. Dezember 2017
Kurzfassung In predictive maintenance scenarios, failure classification is challenging because large high-dimensional data volumes are being generated continuously in modern factories. Currently complex error analysis occurs manually based on recorded data in our industry use-case. The resulting misclassification leads to longer rework times. Our goal is to perform automated failure detection. In particular, this thesis builds a classification model to detect faulty engines in the vehicle manufacturing process.

The work’s first part focuses on the binary anomaly detection classification problem and aims to predict an engine’s deficiency status. Here, we manage to recognize more than 90% of the faulty engines. In the second part, we extend our analysis to the multi-class classification problem with high-unbalanced classes. Here, our objective is to forecast the exact type of failure. To some extent, this situation shows similarities with the microarray analysis – we observe high-dimensional data with few instances available. This thesis develops a relevance-driven feature engineering meta-algorithm framework. We study the integration of feature relevance evaluation in the construction process of new features. We also use ensemble feature selection algorithms and define our own criteria to determine the relevance of feature subsets. These criteria are integrated in the feature engineering process in order to accelerate it by ignoring parts of the search space without significantly degrading the data quality.