Aus SDQ-Institutsseminar
Termin (Alle Termine)
Datum Freitag, 20. Dezember 2019
Uhrzeit 11:00 – 12:15 Uhr (Dauer: 75 min)
Ort Raum 348 (Gebäude 50.34)
Vorheriger Termin Fr 13. Dezember 2019
Nächster Termin Fr 20. Dezember 2019

Termin in Kalender importieren: iCal (Download)


Vortragende(r) Adrian Kruck
Titel Bayesian Optimization for Wrapper Feature Selection
Vortragstyp Masterarbeit
Betreuer(in) Jakob Bach
Kurzfassung Wrapper feature selection can lead to highly accurate classifications. However, the computational costs for this are very high in general. Bayesian Optimization on the other hand has already proven to be very efficient in optimizing black box functions. This approach uses Bayesian Optimization in order to minimize the number of evaluations, i.e. the training of models with different feature subsets. We propose four different ways to set up the objective function for the Bayesian optimization. On 14 different classification datasets the approach is compared against 14 other established feature selection methods, including other wrapper methods, but also filter methods and embedded methods. We use gaussian processes and random forests for the surrogate model. The classifiers which are applied to the selected feature subsets are logistic regression and naive bayes. We compare all the different feature selection methods against each other by comparing their classification accuracies and runtime. Our approach shows to keep up with the most established feature selection methods, but the evaluation also shows that the experimental setup does not value the feature selection enough. Concluding, we give guidelines how an experimental setup can be more appropriate and several concepts are provided of how to develop the Bayesian optimization for wrapper feature selection further.
Vortragende(r) Benjamin Jochum
Titel Discovering data-driven Explanations
Vortragstyp Bachelorarbeit
Betreuer(in) Vadim Arzamasov
Kurzfassung The main goal knowledge discovery focusses is, an increase of knowledge using some set of data. In many cases it is crucial that results are human-comprehensible. Subdividing the feature space into boxes with unique characteristics is a commonly used approach for achieving this goal. The patient-rule-induction method (PRIM) extracts such "interesting" hyperboxes from a dataset by generating boxes that maximize some class occurrence inside of it. However, the quality of the results varies when applied to small datasets. This work will examine to which extent data-generators can be used to artificially increase the amount of available data in order to improve the accuracy of the results. Secondly, it it will be tested if probabilistic classification can improve the results when using generated data.
Neuen Vortrag erstellen