Change Detection in High Dimensional Data Streams: Unterschied zwischen den Versionen

Aus SDQ-Institutsseminar
Keine Bearbeitungszusammenfassung
Keine Bearbeitungszusammenfassung
Zeile 5: Zeile 5:
|betreuer=Edouard Fouché
|betreuer=Edouard Fouché
|termin=Institutsseminar/2021-04-16
|termin=Institutsseminar/2021-04-16
|kurzfassung=Data streams in real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high-dimensional and evolve over time. This will result in outdated models, or events of interest emerge, such as in predictive maintenance. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive system for streaming data. Nevertheless, most popular concept drift detection algorithms detect when a drift occurs (“when”) but can only be applied to univariate data streams, and neglect to examine in which dimensions the drift occurs (“where”).
|kurzfassung=The data collected in many real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high-dimensional and come as a stream, i.e., data properties evolve over time – a phenomenon known as "concept drift". This brings numerous challenges: data-driven models become outdated, and one is typically interested in detecting specific events, e.g., the critical wear and tear of industrial machines. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive predictive system for streaming data. However, existing techniques can only detect "when" a drift occurs and neglect the fact that various drifts may occur in different dimensions, i.e., they do not detect "where" a drift occurs. This is particularly problematic when data streams are high-dimensional.  


Change detection algorithms should act unsupervised and detect change as fast as possible. Beyond that, processing high-dimensional data evokes further challenges like those from the curse of dimensionality, or where a drift occurs. The goal of this Master thesis is the development and evaluation of an unsupervised framework which enables to detect “when” and “where” a drift occurs. We train an autoencoder and detect drift by applying ADWIN on the autoencoder’s reconstruction error.
The goal of this Master’s thesis is to develop and evaluate a framework to efficiently and effectively detect “when” and “where” concept drift occurs in high-dimensional data streams. We introduce stream autoencoder windowing (SAW), an approach based on the online training of an autoencoder, while monitoring its reconstruction error via a sliding window of adaptive size. We will evaluate the performance of our method against synthetic data, in which the characteristics of drifts are known. We then show how our method improves the accuracy of existing classifiers for predictive systems compared to benchmarks on real data streams.
}}
}}

Version vom 13. April 2021, 16:46 Uhr

Vortragende(r) Tanja Fenn
Vortragstyp Proposal
Betreuer(in) Edouard Fouché
Termin Fr 16. April 2021
Vortragsmodus
Kurzfassung The data collected in many real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high-dimensional and come as a stream, i.e., data properties evolve over time – a phenomenon known as "concept drift". This brings numerous challenges: data-driven models become outdated, and one is typically interested in detecting specific events, e.g., the critical wear and tear of industrial machines. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive predictive system for streaming data. However, existing techniques can only detect "when" a drift occurs and neglect the fact that various drifts may occur in different dimensions, i.e., they do not detect "where" a drift occurs. This is particularly problematic when data streams are high-dimensional.

The goal of this Master’s thesis is to develop and evaluate a framework to efficiently and effectively detect “when” and “where” concept drift occurs in high-dimensional data streams. We introduce stream autoencoder windowing (SAW), an approach based on the online training of an autoencoder, while monitoring its reconstruction error via a sliding window of adaptive size. We will evaluate the performance of our method against synthetic data, in which the characteristics of drifts are known. We then show how our method improves the accuracy of existing classifiers for predictive systems compared to benchmarks on real data streams.