Generalized Monte Carlo Dependency Estimation and Anytime Supervised Filter Feature Selection
|Termin||Fr 24. Juni 2022|
|Kurzfassung||Dependency estimation is an important problem in statistics and is applied frequently in data science. As modern datasets can be very large, dependency estimators should be efficient and leverage as much information from data as possible. Traditional bivariate and multivariate dependency estimators are only capable to estimate dependency between two or n one-dimensional datasets, respectively. In this thesis, we are interested in how to develop estimators that can estimate the dependency between n multidimensional datasets, which we call "generalized dependency estimators".
We extend the recently introduced methodology of Monte Carlo Dependency Estimation (MCDE), an effective and efficient traditional multivariate dependency estimator. We introduce Generalized Monte Carlo Dependency Estimation (gMCDE) and focus in particular on the highly relevant subproblem of generalized dependency estimation, known as canonical dependency estimation, which aims to estimate the dependency between two multidimensional datasets. We demonstrate the practical relevance of Canonical Monte Carlo Dependency Estimation (cMCDE) by applying it to feature selection, introducing two methodologies for anytime supervised filter feature selection, Canonical Monte Carlo Feature Selection (cMCFS) and Canonical Multi Armed Bandit Feature Selection (cMABFS). cMCFS directly applies the methodology of cMCDE to feature selection, while cMABFS treats the feature selection problem as a multi armed bandit problem, which utilizes cMCDE to determine relevant features.