|Fr 22. September 2023
|Data streams are ubiquitous in modern applications such as predictive maintenance or quality control. Data streams can change in unpredictable ways, challenging existing supervised learning algorithms that assume a stationary relationship between input data and labels. Supervised learning algorithms for data streams must therefore "adapt" to changing data distributions. Active learning (AL), a sub-field of supervised learning, aims to reduce the total cost of labeling by identifying the most valuable data points for training. However, existing stream-based AL methods have difficulty adapting to changes in data streams as they rely mainly on the sparsely labeled data and ignore the regionality of changes, resulting in slow change adaptions.
To address these issues, this thesis presents an active learning framework for data streams that adapts to regional changes in the underlying data stream. Our idea is to enrich hierarchical data stream clustering with labeling statistics to measure the regionality and relevance of changes. Using such information in stream-based active learning leads to more effective labeling, resulting in faster change adaption.