Feature Selection using Bayesian Optimization

Aus SDQ-Institutsseminar
Vortragende(r) Patrick Ehrler
Vortragstyp Bachelorarbeit
Betreuer(in) Jakob Bach
Termin Fr 16. April 2021
Vortragssprache
Vortragsmodus
Kurzfassung Datasets, like gene profiles from cancer patients, can have a large number of features. In order to apply prediction techniques, a lot of computing time and memory is needed. A solution to this problem is to reduce the number of features, whereby the main challenge is to still receive a satisfactory prediction performance afterwards. There are many state-of-the-art feature selection techniques, but they all have their limitations. We use Bayesian optimization, a technique to optimize expensive black-box-functions, and apply it to the problem of feature selection. Thereby, we face the challenge to adjust the standard optimization procedure to work with a discrete-valued search space, but also to find a way to optimize the acquisition function efficiently.

Overall, we propose 10 different Bayesian optimization feature selection approaches and evaluate their performance experimentally on 28 OpenML classification datasets. We do not only compare the approaches among themselves, but also to 9 state-of-the-art feature selection approaches. Our results state that especially four of our approaches perform well and can compete to most state-of-the-art approaches in terms of prediction performance. In terms of runtime, all our approaches do not perform outstandingly good, but similar to some filter and wrapper approaches.