Standardized Real-World Change Detection Data Defense
|Termin||Fr 9. September 2022|
|Kurzfassung||The reliable detection of change points is a fundamental task when analyzing data across many fields, e.g., in finance, bioinformatics, and medicine.
To define “change points”, we assume that there is a distribution, which may change over time, generating the data we observe. A change point then is a change in this underlying distribution, i.e., the distribution coming before a change point is different from the distribution coming after. The principled way to compare distributions, and thus to find change points, is to employ statistical tests.
While change point detection is an unsupervised problem in practice, i.e., the data is unlabeled, the development and evaluation of data analysis algorithms requires labeled data. Only a few labeled real-world data sets are publicly available, and many of them are either too small or have ambiguous labels. Further issues are that reusing data sets may lead to overfitting, and preprocessing may manipulate results. To address these issues, Burg et al. publish 37 data sets annotated by data scientists and ML researchers and assess 14 change detection algorithms on them. Yet, there remain concerns due to the fact that these are labeled by hand: Can humans correctly identify changes according to the definition, and can they be consistent in doing so?