Streaming MMD Change Detection

Aus SDQ-Institutsseminar
Version vom 9. März 2023, 13:11 Uhr von Florian Kalinke (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Georg Gntuni |email=g.gntuni@gmail.com |vortragstyp=Bachelorarbeit |betreuer=Florian Kalinke |termin=Institutsseminar/2023-03-24 |vortr…“)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Vortragende(r) Georg Gntuni
Vortragstyp Bachelorarbeit
Betreuer(in) Florian Kalinke
Termin Fr 24. März 2023
Vortragssprache
Vortragsmodus in Präsenz
Kurzfassung Kernel methods are among the most well-known approaches in data science. Their ability to represent probability distributions as elements in a reproducing kernel Hilbert space gives rise to maximum mean discrepancy (MMD). MMD quantifies the dissimilarity of two distributions and allows powerful two-sample tests on many domains. One important application of general two-sample tests is change detection in data streams: Here, one tests the null hypothesis that the distributions of data within the stream do not change versus the alternative hypothesis that the distributions do change; a change in distribution then indicates a change point. The broad applicability of kernel-based two-sample tests renders their use for change detection in data streams highly desirable. But, their quadratic runtime complexity prohibits their application. While approximations for kernel methods that reduce their runtime in the static setting exist, their application to data streams is challenging.

In this thesis, we propose a novel change detector, RADMAN, which leverages the random Fourier feature-based kernel approximation to efficiently detect changes in data streams with a polylogarithmic runtime complexity of O(log^2 n) per insert operation, with n the total number of observations. The proposed approach runs significantly faster than existing methods but obtains similar result quality. Our experiments on synthetic and real-world data sets show that it performs better than current state-of-the-art approaches.