Subspace Search in Data Streams: Unterschied zwischen den Versionen
(Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Florian Kalinke |email=utzzc@student.kit.edu |vortragstyp=Proposal |betreuer=Edouard Fouché |termin=Institutsseminar/2019-07-19 |kurzf…“) |
Keine Bearbeitungszusammenfassung |
||
(5 dazwischenliegende Versionen von 3 Benutzern werden nicht angezeigt) | |||
Zeile 2: | Zeile 2: | ||
|vortragender=Florian Kalinke | |vortragender=Florian Kalinke | ||
|email=utzzc@student.kit.edu | |email=utzzc@student.kit.edu | ||
|vortragstyp= | |vortragstyp=Masterarbeit | ||
|betreuer=Edouard Fouché | |betreuer=Edouard Fouché | ||
|termin=Institutsseminar/2019- | |termin=Institutsseminar/2019-11-22 | ||
|kurzfassung=Modern data mining often takes place on high-dimensional data streams | |kurzfassung=Modern data mining often takes place on high-dimensional data streams, which evolve at a very fast pace: On the one hand, the "curse of dimensionality" leads to a sparsely populated feature space, for which classical statistical methods perform poorly. Patterns, such as clusters or outliers, often hide in a few low-dimensional subspaces. On the other hand, data streams are non-stationary and virtually unbounded. Hence, algorithms operating on data streams must work incrementally and take concept drift into account. | ||
While "high-dimensionality" and the "streaming setting" provide two unique sets of challenges, we observe that the existing mining algorithms only address them separately. Thus, our plan is to propose a novel algorithm, which keeps track of the subspaces of interest in high-dimensional data streams over time. We quantify the relevance of subspaces via a so-called "contrast" measure, which we are able to maintain incrementally in an efficient way. Furthermore, we propose a set of heuristics to adapt the search for the relevant subspaces as the data and the underlying distribution evolves. | |||
We show that our approach is beneficial as a feature selection method and as such can be applied to extend a range of knowledge discovery tasks, e.g., "outlier detection", in high-dimensional data-streams. | |||
}} | }} |
Aktuelle Version vom 12. November 2019, 15:05 Uhr
Vortragende(r) | Florian Kalinke | |
---|---|---|
Vortragstyp | Masterarbeit | |
Betreuer(in) | Edouard Fouché | |
Termin | Fr 22. November 2019 | |
Vortragssprache | ||
Vortragsmodus | ||
Kurzfassung | Modern data mining often takes place on high-dimensional data streams, which evolve at a very fast pace: On the one hand, the "curse of dimensionality" leads to a sparsely populated feature space, for which classical statistical methods perform poorly. Patterns, such as clusters or outliers, often hide in a few low-dimensional subspaces. On the other hand, data streams are non-stationary and virtually unbounded. Hence, algorithms operating on data streams must work incrementally and take concept drift into account.
While "high-dimensionality" and the "streaming setting" provide two unique sets of challenges, we observe that the existing mining algorithms only address them separately. Thus, our plan is to propose a novel algorithm, which keeps track of the subspaces of interest in high-dimensional data streams over time. We quantify the relevance of subspaces via a so-called "contrast" measure, which we are able to maintain incrementally in an efficient way. Furthermore, we propose a set of heuristics to adapt the search for the relevant subspaces as the data and the underlying distribution evolves. We show that our approach is beneficial as a feature selection method and as such can be applied to extend a range of knowledge discovery tasks, e.g., "outlier detection", in high-dimensional data-streams. |