Kurzfassung
|
Dependency estimation is a crucial task in data analysis and finds applications in, e.g., data understanding, feature selection and clustering. This thesis focuses on Canonical Dependency Analysis, i.e., the task of estimating the dependency between two random vectors, each consisting of an arbitrary amount of random variables. This task is particularly difficult when (1) the dimensionality of those vectors is high, and (2) the dependency is non-linear. We propose Canonical Monte Carlo Dependency Estimation (cMCDE), an extension of Monte Carlo Dependency Estimation (MCDE, Fouché 2019) to solve this task. Using Monte Carlo simulations, cMCDE estimates dependency based on the average discrepancy between empirical conditional distributions. We show that cMCDE inherits the useful properties of MCDE and compare it to existing competitors. We also propose and apply a method to leverage cMCDE for selecting features from very high-dimensional features spaces, demonstrating cMCDE’s practical relevance.
|