Subspace Search in Data Streams

Aus SDQ-Institutsseminar
Version vom 4. Juni 2019, 09:44 Uhr von Florian Kalinke (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Florian Kalinke |email=utzzc@student.kit.edu |vortragstyp=Proposal |betreuer=Edouard Fouché |termin=Institutsseminar/2019-07-19 |kurzf…“)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Vortragende(r) Florian Kalinke
Vortragstyp Proposal
Betreuer(in) Edouard Fouché
Termin Fr 19. Juli 2019
Vortragssprache
Vortragsmodus
Kurzfassung Modern data mining often takes place on high-dimensional data streams that arrive at a very fast pace. High dimensionality and the speed of arrival provide two unique sets of challenges, while current mining algorithms often tackle only one of them.

With the high-dimensionality, the curse of dimensionality comes into effect. This leads to a sparsely populated feature space, for which classical statistical methods perform poorly. Patterns, such as clusters or outliers, often hide in low-dimensional subspaces of interest and cannot be discovered in the high-dimensional space.

Data streams are virtually unbounded, and the distribution of the data may change over time. Hence, algorithms operating on data streams have to work incrementally and have to take concept drift into account.

In this thesis we propose a streaming algorithm to track the subspaces in which patterns may occur over time. We quantify the relevance of subspaces using a so-called contrast measure, which quantifies the strength of a potential relationship between the attributes of the subspaces. As the relevance of subspaces may change over time, the proposed algorithm uses a heuristic to search for the relevant subspaces as the data and the underlying distribution evolves.