Impact of Aggregation Methods on Clustering of High-Resolution Energy Data

Aus SDQ-Institutsseminar
Vortragende(r) Jakob Bach
Vortragstyp Masterarbeit
Betreuer(in) Holger Trittenbach
Termin Fr 24. November 2017
Kurzfassung Energy data can be used to gain insights into production processes. In the industrial domain, sensors have high sampling rates, resulting in large time series. Therefore, aggregation techniques are used to reduce computation times and memory requirements of data mining techniques like clustering. However, it is unclear what effects the aggregation has on clustering results and how these effects could be described.

In our work, we propose measures to analyse the impact of aggregation on clustering and evaluate them experimentally. In particular, we aggregate with standard summary statistics and assess the impact using clustering structure measures, internal validity indices, external validity indices and instance-based forecasting. We adapt these evaluation measures and other data mining techniques to our use case. Furthermore, we propose a decision framework which allows to choose an aggregation level and other experimental settings, considering the trade-off between clustering quality and computational cost.

Our extensive experiments comprise the cross-product of 6 physical attributes, 7 clustering algorithms, 7 aggregation techniques, 9 aggregation levels and 13 time series dissimilarities. We use real-world data from different machines and sensors of a production site at the KIT Campus North, extracting time series of fixed and variable length. Overall, we find that clustering results become less similar the more the data is aggregated. However, the exact effect and value of evaluation measures depends on the type of aggregate, clusteringalgorithm, dataset and dissimilarity measure.