Review of data efficient dependency estimation: Unterschied zwischen den Versionen

Aus SDQ-Institutsseminar
(Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Maximilian Georg |email=maximilian.georg@student.kit.edu |vortragstyp=Proposal |betreuer=Bela Böhnke |termin=Institutsseminar/2022-02-…“)
 
Keine Bearbeitungszusammenfassung
 
Zeile 6: Zeile 6:
|termin=Institutsseminar/2022-02-25 Zusatztermin
|termin=Institutsseminar/2022-02-25 Zusatztermin
|vortragsmodus=online
|vortragsmodus=online
|kurzfassung=The amount and complexity of data collected in the industry is increasing, and data analysis rises in importance.Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.
|kurzfassung=The amount and complexity of data collected in the industry is increasing, and data analysis rises in importance. Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.
There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug.
There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug.
Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price.
Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price.
Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.
Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.


Many dependency estimation algorithms perform poorly in a real world setting because they do not consider multivariate dependencies. Multivariate dependencies are very common and occur, in the material science example where the properties of the synthesized material depend on many variables.
Also, dependency estimation algorithms are often not robust against errors in the data. But data is error-prone, take for instance data about the health of a patient for a clinical study, which is hard to measure accurately.
Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy.
Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy.
As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.
As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.
Furthermore, many algorithms are too complex to be used by a non expert. The parameters of an algorithm need to be intuitive to use, and the result should be interpretable. Only then people outside of academia can apply the algorithm without mistakes.


In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing the above-mentioned challenges. We partly developed the criteria our self as well as took them from relevant publications. Many of the existing criteria where only formulated qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.
In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing these challenges and more. We partly developed the criteria our self as well as took them from relevant publications. The existing publications formulated many of the criteria only qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.


From 14 selected criteria, the focus will be on data efficiency and uncertainty estimation. These criteria are essential for lowering the cost of dependency estimation. The expected result of this bachelor's thesis is to identify an algorithm that fulfils all 14 criteria.
From 14 selected criteria, we focus on criteria concerning data efficiency and uncertainty estimation, because they are essential for lowering the cost of dependency estimation, but we will also check other criteria relevant for the application of algorithms.
In the comparison we include a qualitative analysis by checking general criteria, that increase the usability for non experts, such criteria are interpretability, and intuitiveness.
As a result, we will rank the algorithms in the different aspects given by the criteria, and thereby identify potential for improvement of the current algorithms.
We also analyse if the algorithm is an anytime algorithm and if it uses incremental computation to enable early stopping and increase data efficiency.
Another criterion is guided sampling, which can lead to more data efficiency.
To apply the algorithms to different kinds of datasets, we also analyse if the algorithms are multivariate, general-purpose, and non-parametric.


We also conduct a quantitative analysis of the dependency estimation algorithms that performed well in the qualitative analysis by experiment on well-established and representative datasets.
We do this in two steps, first we check general criteria in a qualitative analysis. For this we check if the algorithm is capable of guided sampling, if it is an anytime algorithm and if it uses incremental computation to enable early stopping, which all leads to more data efficiency.
 
We also conduct a quantitative analysis on well-established and representative datasets for the dependency estimation algorithms, that performed well in the qualitative analysis.
In these experiments we evaluate more criteria:
In these experiments we evaluate more criteria:
The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.
The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.
}}
}}

Aktuelle Version vom 16. Februar 2022, 12:45 Uhr

Vortragende(r) Maximilian Georg
Vortragstyp Proposal
Betreuer(in) Bela Böhnke
Termin Fr 25. Februar 2022
Vortragssprache
Vortragsmodus online
Kurzfassung The amount and complexity of data collected in the industry is increasing, and data analysis rises in importance. Dependency estimation is a significant part of knowledge discovery and allows strategic decisions based on this information.

There are multiple examples that highlight the importance of dependency estimation, like knowing there exists a correlation between the regular dose of a drug and the health of a patient helps to understand the impact of a newly manufactured drug. Knowing how the case material, brand, and condition of a watch influences the price on an online marketplace can help to buy watches at a good price. Material sciences can also use dependency estimation to predict many properties of a material before it is synthesized in the lab, so fewer experiments are necessary.

Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy. As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation.

In this bachelor's thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing these challenges and more. We partly developed the criteria our self as well as took them from relevant publications. The existing publications formulated many of the criteria only qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest.

From 14 selected criteria, we focus on criteria concerning data efficiency and uncertainty estimation, because they are essential for lowering the cost of dependency estimation, but we will also check other criteria relevant for the application of algorithms. As a result, we will rank the algorithms in the different aspects given by the criteria, and thereby identify potential for improvement of the current algorithms.

We do this in two steps, first we check general criteria in a qualitative analysis. For this we check if the algorithm is capable of guided sampling, if it is an anytime algorithm and if it uses incremental computation to enable early stopping, which all leads to more data efficiency.

We also conduct a quantitative analysis on well-established and representative datasets for the dependency estimation algorithms, that performed well in the qualitative analysis. In these experiments we evaluate more criteria: The robustness, which is necessary for error-prone data, the efficiency which saves time in the computation, the convergence which guarantees we get an accurate estimation with enough data, and consistency which ensures we can rely on an estimation.