Semantische Suche

Freitag, 7. Juli 2023, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)

Vortragende(r) Jamil Bagga
Titel Developing a Database Application to Compare the Google Books Ngram Corpus to German News Corpora
Vortragstyp Proposal
Betreuer(in) Fabian Richter
Vortragsmodus in Präsenz
Kurzfassung This thesis focuses on the development of a database application that enables a comparative analysis between the Google Books Ngram Corpus(GBNC) and a German news corpora. The GBNC provides a vast collection of books spanning various time periods, while the German news corpora encompass up-to-date linguistic data from news sources. Such comparison aims to uncover insights into language usage patterns, linguistic evolution, and cultural shifts within the German language.

Extracting meaningful insights from the compared corpora requires various linguistic metrics, statistical analyses and visualization techniques. By identifying patterns, trends and linguistic changes we can uncover valuable information on language usage evolution over time. This thesis provides a comprehensive framework for comparing the GBNC to other corpora, showcasing the development of a database application that enables not only valuable linguistic analyses but also shed light on the composition of the GBNC by highlighting linguistic similarities and differences.

Freitag, 14. Juli 2023, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)

Vortragende(r) Simeon Becker
Titel Konsistenzhaltung von Eingabemodellen für Architekturanalysen und statischen Quelltextanalysen für Sicherheit
Vortragstyp Bachelorarbeit
Betreuer(in) Frederik Reiche
Vortragsmodus in Präsenz
Kurzfassung Architekturanalysen können in Architekturmodellen Sicherheitseigenschaften spezifizieren. Diese Spezifikationen können von statischen Sicherheitsanalysen anhand dem Quelltext überprüft werden. Dafür müssen sich diese Modelle alle auf demselben Stand befinden. Die manuelle Konsistenzhaltung der Modelle ist jedoch aufwändig.

Daher wird dieser Arbeit ein Konzept für eine automatische Konsistenzhaltung vier verschiedener Modelle umgesetzt, welche als Eingabemodelle für eine statische Sicherheitsanalyse dienen. Diese vier Modelle sind ein Architekturmodell, dessen Quelltext und jeweils dazu passende Annotationen für eine statische Sicherheitsanalyse. Es wird zunächst ein Konzept für die Konsistenzhaltung zwischen diesen vier Modelltypen entwickelt. Für das entwickelte Konzept wurde anhand einer Fallstudie mit vier konkreten Metamodellen in dem Framework Vitruvius eine Konsistenzhaltung implementiert. Für diese wurde auf einer existierenden Konsistenzhaltung zwischen dem Quelltext und der Architekturmodellierung aufgebaut. Diese Implementierung wurde anhand eines Testmodells evaluiert. Diese hat ergeben, dass es machbar ist, anhand des in dieser Arbeit vorgestellten Konzeptes eine Konsistenzhaltung für die Eingabemodelle zu implementieren. Jedoch ist die Implementierung der Regeln aufwändig bei komplexen Abbildungen zwischen den Elementen.

Freitag, 14. Juli 2023, 13:00 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)

Vortragende(r) David Schulmeister
Titel Hidden Outliers in Manifolds
Vortragstyp Proposal
Betreuer(in) Jose Cribeiro
Vortragsmodus in Präsenz
Kurzfassung Hidden outliers represent instances of disagreement between a full-space and an ensemble. This adversarial nature naturally replicates the subspace behavior that high-dimensional outliers exhibit in reality. Due to this, they have been proven useful for representing complex occurrences like fraud, critical infrastructure failure, and healthcare data, as well as for their use in general outlier detection as the positive class of a self-supervised learner. However, while interesting, hidden outliers' quality highly depends on the number of subspaces selected in the ensemble out of the total possible. Since the number of subspaces increases exponentially with the number of features, this makes high-dimensional applications of Data Analysis, such as Computer Vision, computationally unfeasible. In this thesis, we are going to study the generation of hidden outliers on the embedded data manifold using deep learning techniques to overcome this issue. More precisely, we are going to study the behavior, characteristics, and performance in multiple use-cases of hidden outliers in the data manifold.
Vortragende(r) Denis Wambold
Titel Subspace Generative Adversarial Learning for Unsupervised Outlier Detection
Vortragstyp Proposal
Betreuer(in) Jose Cribeiro
Vortragsmodus in Präsenz
Kurzfassung Outlier detection is an important yet challenging task, especially for unlabeled, high-dimensional, datasets. Due to their self-supervised generative nature, Generative Adversarial Networks (GAN) have proven themselves to be one of the most powerful deep learning methods for outlier detection. However, most state-of-the-art GANs for outlier detection share common limitations. Oftentimes we only achieve great results if the model’s hyperparameters are properly tuned or the underlying network structure is adjusted. This optimization is not possible in practice when the data is unlabeled. If not tuned properly, it is not unusual that a state-of-the-art GAN method is outperformed by simpler shallow methods.

We propose using a GAN architecture with feature ensemble learning to address hyperparameter sensibility and architectural dependency. This follows the success of feature ensembling in mitigating these problems inside other areas of Deep Learning. This thesis will study the optimization problem, training, and tuning of feature ensemble GANs in an unsupervised scenario, comparing it to other deep generative methods in a similar setting.

Freitag, 21. Juli 2023, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)

Vortragende(r) Vincenzo Pace
Titel Attention Based Selection of Log Templates for Automatic Log Analysis
Vortragstyp Bachelorarbeit
Betreuer(in) Pawel Bielski
Vortragsmodus in Präsenz
Kurzfassung Log analysis serves as a crucial preprocessing step in text log data analysis, including anomaly detection in cloud system monitoring. However, selecting an optimal log parsing algorithm tailored to a specific task remains problematic.

With many algorithms to choose from, each requiring proper parameterization, making an informed decision becomes difficult. Moreover, the selected algorithm is typically applied uniformly across the entire dataset, regardless of the specific data analysis task, often leading to suboptimal results.

In this thesis, we evaluate a novel attention-based method for automating the selection of log parsing algorithms, aiming to improve data analysis outcomes. We build on the success of a recent Master Thesis, which introduced this attention-based method and demonstrated its promising results for a specific log parsing algorithm and dataset. The primary objective of our work is to evaluate the effectiveness of this approach across different algorithms and datasets.

Freitag, 18. August 2023, 11:00 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)
Webkonferenz: https://kit-lecture.zoom.us/j/67744231815

Vortragende(r) Aaron Gätje
Titel Graph Attention Network for Injection Molding Process Simulation
Vortragstyp Masterarbeit
Betreuer(in) Daniel Ebi
Vortragsmodus in Präsenz
Kurzfassung Graph Neural Networks (GNNs) have demonstrated great potential for simulating physical systems that can be represented as graphs. However, training GNNs presents unique challenges due to the complex nature of graph data. The focus of this thesis is to examine their learning abilities by developing a GNN-based surrogate model for the injection molding process from materials science. While numerical simulations can accurately model the mold filling with molten plastic, they are computationally expensive and require significant trial-and-error for parameter optimization.

We propose a GNN-based model that can predict the fill times and physical properties of the mold filling process. We model the mold geometry as a static graph and encode the process information into node, edge, and global features. We employ a self-attention mechanism to enhance the learning of the direction and magnitude of the fluid flow. To further enforce the physical constraints and behaviors of the process, we leverage domain knowledge to construct features and loss functions. We train our model on simulation data, using a multi-step loss to capture the temporal dependencies and enable it to iteratively predict the filling for unseen molds. Thereby, we compare our models with different distance-based heuristics and conventional machine learning models as baselines in terms of predictive performance, computational efficiency, and generalization ability. We evaluate our architectural and training choices, and discuss both the potential applications and challenges of using GNNs for surrogate modeling of injection molding.

Vortragende(r) Christoph Batke
Titel Improving SAP Document Information Extraction via Pretraining and Fine-Tuning
Vortragstyp Masterarbeit
Betreuer(in) Edouard Fouché
Vortragsmodus in Präsenz
Kurzfassung Techniques for extracting relevant information from documents have made significant progress in recent years and became a key task in the digital transformation. With deep neural networks, it became possible to process documents without specifying hard-coded extraction rules or templates for each layout. However, such models typically have a very large number of parameters. As a result, they require many annotated samples and long training times. One solution is to create a basic pretrained model using self-supervised objectives and then to fine-tune it using a smaller document-specific annotated dataset. However, implementing and controlling the pretraining and fine-tuning procedures in a multi-modal setting is challenging. In this thesis, we propose a systematic method that consists in pretraining the model on large unlabeled data and then to fine-tune it with a virtual adversarial training procedure. For the pretraining stage, we implement an unsupervised informative masking method, which improves upon standard Masked-Language Modelling (MLM). In contrast to randomly masking tokens like in MLM, our method exploits Point-Wise Mutual Information (PMI) to calculate individual masking rates based on statistical properties of the data corpus, e.g., how often certain tokens appear together on a document page. We test our algorithm in a typical business context at SAP and report an overall improvement of 1.4% on the F1-score for extracted document entities. Additionally, we show that the implemented methods improve the training speed, robustness and data-efficiency of the algorithm.