Semantische Suche

Freitag, 12. August 2022, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)
Webkonferenz: {{{Webkonferenzraum}}}

Vortragende(r) Maximilian Georg
Titel A Comparative Analysis of Data-Efficient Dependency Estimators
Vortragstyp Bachelorarbeit
Betreuer(in) Bela Böhnke
Vortragsmodus online
Kurzfassung Dependency estimation is a significant part of knowledge

discovery and allows strategic decisions based on this information. Many dependency estimation algorithms require a large amount of data for a good estimation. But data can be expensive, as an example experiments in material sciences, consume material and take time and energy. As we have the challenge of expensive data collection, algorithms need to be data efficient. But there is a trade-off between the amount of data and the quality of the estimation. With a lack of data comes an uncertainty of the estimation. However, the algorithms do not always quantify this uncertainty. As a result, we do not know if we can rely on the estimation or if we need more data for an accurate estimation. In this bachelor’s thesis we compare different state-of-the-art dependency estimation algorithms using a list of criteria addressing the above-mentioned challenges. We partly developed the criteria our self as well as took them from relevant publications. Many of the existing criteria where only formulated qualitative, part of this thesis is to make these criteria measurable quantitative, where possible, and come up with a systematic approach of comparison for the rest. We also conduct a quantitative analysis of the dependency estimation algorithms by experiment on well-established and representative data sets that performed well in the qualitative analysis.

Freitag, 19. August 2022, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)
Webkonferenz: {{{Webkonferenzraum}}}

Vortragende(r) Sönke Jendral
Titel Refining Domain Knowledge for Domain Knowledge Guided Machine Learning
Vortragstyp Bachelorarbeit
Betreuer(in) Pawel Bielski
Vortragsmodus online
Kurzfassung Advances in computational power have led to increased in interest in machine learning techniques. Sophisticated approaches now solve various prediction problems in the domain of healthcare. Traditionally, machine learning techniques integrate domain knowledge implicitly, by statistically extracting dependencies from their input data. Novel approaches instead integrate domain knowledge from taxonomies as an external component.

However, these approaches assume the existence of high quality domain knowledge and do not acknowledge issues stemming from low quality domain knowledge. It is thus unclear what low quality domain knowledge in the context of Domain Knowledge Guided Machine Learning looks like and what its causes are. Further it is not clearly understood what the impact of low quality domain knowledge on the machine learning task is and what steps can be taken to improve the quality in this context.

In this Thesis we describe low quality domain knowledge and show examples of such knowledge in the context of a sequential prediction task. We further propose methods for identifying low quality domain knowledge in the context of Domain Knowledge Guided Machine Learning and suggest approaches for improving the quality of domain knowledge in this context.

Vortragende(r) Elizaveta Danilova
Titel Wichtigkeit von Merkmalen für die Klassifikation von SAT-Instanzen (Abschlusspräsentation)
Vortragstyp Bachelorarbeit
Betreuer(in) Jakob Bach
Vortragsmodus in Präsenz
Kurzfassung Das SAT-Problem ist ein zentrales Problem der theoretischen Informatik. Wegen seiner NP-Schwere sind Forscher insbesondere an effizienten Lösungsverfahren dafür interessiert. Die Kenntnis der Familie einer Instanz kann zur Problemlösung beitragen. In unserer Arbeit haben wir untersucht, wie SAT-Instanzen durch maschinelles Lernen effizient klassifiziert werden können und welche Verfahren sich am besten dazu eignen. Außerdem betrachteten wir, welche Merkmale die Instanzen am eindeutigsten charakterisieren und wie sich die Anzahl der verwendeten Merkmale auf das Klassifikationsergebnis auswirkt. Letztlich untersuchten wir, welche Familien vermehrt fehlklassifiziert werden und was die Gründe dafür sind.

Freitag, 26. August 2022, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)
Webkonferenz: {{{Webkonferenzraum}}}

Vortragende(r) Manuel Müllerschön
Titel Deriving Twitter Based Time Series Data for Correlation Analysis
Vortragstyp Bachelorarbeit
Betreuer(in) Fabian Richter
Vortragsmodus in Präsenz
Kurzfassung Twitter has been identified as a relevant data source for modelling purposes in the last decade. In this work, our goal was to model the conversational dynamics of inflation development in Germany through Twitter Data Mining. To accomplish this, we summarized and compared Twitter data mining techniques for time series data from pertinent research. Then, we constructed five models for generating time series from topic-related tweets and user profiles of the last 15 years. Evaluating the models, we observed that several approaches like modelling for user impact or adjusting for automated twitter accounts show promise. Yet, in the scenario of modelling inflation expectation dynamics, these more complex models could not contribute to a higher correlation between German CPI and the resulting time series compared to a baseline approach.

Freitag, 2. September 2022, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)
Webkonferenz: {{{Webkonferenzraum}}}

Vortragende(r) Benjamin Jochum
Titel Surrogate models for crystal plasticity - predicting stress, strain and dislocation density over time
Vortragstyp Proposal
Betreuer(in) Daniel Betsche
Vortragsmodus in Präsenz
Kurzfassung When engineers design structures, prior knowledge of how they will react to external forces is crucial. Applied forces introduce stress, leading to dislocations of individual molecules that ultimately may cause material failure, like cracks, if the internal strain of the material exceeds a certain threshold. We can observe this by applying increasing physical forces to a structure and measure the stress, strain and the dislocation density curves.

Finite Elemente Analysis (FEM) enables the simulation of a material deforming under external forces, but it comes with very high computational costs. This makes it unfeasible to conduct a large number of simulations with varying parameters. In this thesis, we use neural network based sequence models to build a data-driven surrogate model that predicts stress, strain and dislocation density curves produced by an FEM-simulation based on the simulation’s input parameters.

Freitag, 9. September 2022, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)
Webkonferenz: {{{Webkonferenzraum}}}

Vortragende(r) Moritz Teichner
Titel Standardized Real-World Change Detection Data Defense
Vortragstyp Bachelorarbeit
Betreuer(in) Florian Kalinke
Vortragsmodus in Präsenz
Kurzfassung The reliable detection of change points is a fundamental task when analyzing data across many fields, e.g., in finance, bioinformatics, and medicine.

To define “change points”, we assume that there is a distribution, which may change over time, generating the data we observe. A change point then is a change in this underlying distribution, i.e., the distribution coming before a change point is different from the distribution coming after. The principled way to compare distributions, and thus to find change points, is to employ statistical tests.

While change point detection is an unsupervised problem in practice, i.e., the data is unlabeled, the development and evaluation of data analysis algorithms requires labeled data. Only a few labeled real-world data sets are publicly available, and many of them are either too small or have ambiguous labels. Further issues are that reusing data sets may lead to overfitting, and preprocessing may manipulate results. To address these issues, Burg et al. publish 37 data sets annotated by data scientists and ML researchers and assess 14 change detection algorithms on them. Yet, there remain concerns due to the fact that these are labeled by hand: Can humans correctly identify changes according to the definition, and can they be consistent in doing so?