A Comparative Analysis of Data-Efficient Dependency Estimators

Aus SDQ-Institutsseminar
Version vom 28. Juli 2022, 14:01 Uhr von Maximilian Georg (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Maximilian Georg |email=maximilian.georg@student.kit.edu |vortragstyp=Bachelorarbeit |betreuer=Bela Böhnke |termin=Institutsseminar/20…“)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Vortragende(r) Maximilian Georg
Vortragstyp Bachelorarbeit
Betreuer(in) Bela Böhnke
Termin Fr 12. August 2022
Vortragssprache
Vortragsmodus online
Kurzfassung [[Kurzfassung::The amount and complexity of data collected in the industry is increasing, and data

analysis rises in importance. Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information. A strategic decision can include increasing the value of a variable we can influence and thereby influence another variable which is dependent on the one we increased and can not be manipulated directly. Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy. As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation. This uncertainty measure should ideally include error bound or a distribution over possible values. In this bachelor’s thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing the above-mentioned challenges. We partly developed the criteria our self as well as took them from relevant publications. Many of the existing criteria where only formulated qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest. We also conduct a quantitative analysis of the dependency estimation algorithms by experiment on well-established and representative data sets that performed well in the qualitative analysis. Our analysis showed that from the compared dependency estimators the MGC[1] dependency estimator is the most data-efficient. MGC achieved the highest power compared to the amount of data. This approach is also consistent and has a high rate of convergence.]]