Predicting System Dependencies from Tracing Data Instead of Computing Them

Aus SDQ-Institutsseminar
Vortragende(r) Aleksandr Eismont
Vortragstyp Proposal
Betreuer(in) Pawel Bielski
Termin Fr 26. Februar 2021
Kurzfassung The concept of Artificial Intelligence for IT Operations combines big data and machine learning methods to replace a broad range of IT operations including availability and performance monitoring of services. In large-scale distributed cloud infrastructures a service is deployed on different separate nodes. As the size of the infrastructure increases in production, the analysis of metrics parameters becomes computationally expensive. We address the problem by proposing a method to predict dependencies between metrics parameters of system components instead of computing them. To predict the dependencies we use time windowing with different aggregation methods and distributed tracing data that contain detailed information for the system execution workflow. In this bachelor thesis, we inspect the different representations of distributed traces from simple counting of events to more complex graph representations. We compare them with each other and evaluate the performance of such methods.