Semantische Suche

Diese Seite oder Aktion benötigt JavaScript, um zu funktionieren. Bitte aktiviere JavaScript in deinem Browser oder verwende einen Browser, der dies unterstützt, damit die angeforderten Funktionen ausgeführt werden können. Weitere Informationen findest du auf der Hilfeseite zu „noscript“.

Suche Alle Einträge löschen

Donnerstag, 16. Februar 2023, 10:00 Uhr

iCal (Download)
Webkonferenz: https://kit-lecture.zoom.us/j/67744231815

Vortragende(r)	Christoph Batke
Titel	Improving Document Information Extraction with efficient Pre-Training
Vortragstyp	Proposal
Betreuer(in)	Edouard Fouché
Vortragsmodus	online
Kurzfassung	SAP Document Information Extraction (DOX) is a service to extract logical entities from scanned documents based on the well-known Transformer architecture. The entities comprise header information such as document date or sender name, and line items from tables on the document with fields such as line item quantity. The model currently needs to be trained on a huge number of labeled documents, which is impractical. Also, this hinders the deployment of the model at large scale, as it cannot easily adapt to new languages or document types. Recently, pretraining large language models with self-supervised learning techniques have shown good results as a preliminary step, and allow reducing the amount of labels required in follow-up steps. However, to generalize self-supervised learning to document understanding, we need to take into account different modalities: text, layout and image information of documents. How to do that efficiently and effectively is unclear yet. The goal of this thesis is to come up with a technique for self-supervised pretraining within SAP DOX. We will evaluate our method and design decisions against SAP data as well as public data sets. Besides the accuracy of the extracted entities, we will measure to what extent our method lets us lower label requirements.

Freitag, 3. März 2023, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)
Webkonferenz: https://sdqweb.ipd.kit.edu/wiki/SDQ-Oberseminar/Microsoft_Teams

Vortragende(r)	Janek Speit
Titel	Automated Classification of Design Decision in Software Architecture Documentation
Vortragstyp	Masterarbeit
Betreuer(in)	Jan Keim
Vortragsmodus	in Präsenz
Kurzfassung	Die Softwarearchitekturdokumentation (SAD) ist ein integrales Artefakt eines Softwareprojektes. Um die Qualität von SADs zu verbessern und nachgelagerte Aufgaben zu unterstützen, ist eine automatische Klassifizierung dieser Entwurfsentscheidungen erstrebenswert. In dieser Arbeit implementieren und evaluieren wir einen Ansatz zur automatischen Identifikation und Klassifizierung von Entwurfsentscheidungen auf der Grundlage einer feingranularen Taxonomie, bei der wir eine hierarchische Klassifikationsstrategie mit dem Einsatz von Transfer-Lernen durch vortrainierter Sprachmodelle kombinieren. Der Beitrag dieser Arbeit besteht darin, den Vorteil einer hierarchischen Klassifikationsstrategie für die automatische Klassifikation von Entwurfsentscheidungen gegenüber einem nicht-hierarchischen Ansatz zu untersuchen. Außerdem untersuchen und vergleichen wir die Effektivität verschiedener vortrainierter Sprachmodelle.

Vortragende(r)	Stefanie Fischer
Titel	Faster Feedback Cycles via Integration Testing Strategies for Serverless Edge Computing
Vortragstyp	Masterarbeit
Betreuer(in)	Robert Heinrich
Vortragsmodus	in Präsenz
Kurzfassung	Serverless computing allows software engineers to develop applications in the cloud without having to manage the infrastructure. The infrastructure is managed by the cloud provider. Therefore, software engineers treat the underlying infrastructure as a black box and focus on the business logic of the application. This lack of inside knowledge leads to an increased testing difficulty as applications tend to be dependent on the infrastructure and other applications running in the cloud environment. While isolated unit and functional testing is possible, integration testing is a challenge, as reliable results are often only achieved after deploying to the deployment environment because infrastructure specifics and other cloud services are only available in the actual cloud environment. This leads to a laborious development process. For this reason, this thesis deals with creating testing strategies for serverless edge computing to reduce feedback cycles and speed up development time. For evaluation, the developed testing strategies are applied to Lambda@Edge in AWS.

Donnerstag, 9. März 2023, 10:00 Uhr

iCal (Download)
Webkonferenz: https://kit-lecture.zoom.us/j/67744231815

Vortragende(r)	Dan Jia
Titel	Reinforcement Learning for Solving the Knight’s Tour Problem
Vortragstyp	Proposal
Betreuer(in)	Edouard Fouché
Vortragsmodus	online
Kurzfassung	The knight’s tour problem is an instance of the Hamiltonian path problem that is a typical NP-hard problem. A knight makes L-shape moves on a chessboard and tries to visit all the squares exactly once. The tour is closed if a knight can finish a complete tour and end on a square that is a neighbourhood of its starting square; Otherwise, it is open. Many algorithms and heuristics have been proposed to solve this problem. The most well-known one is warnsdorff’s heuristic. Warnsdorff’s idea is to move to the square with the fewest possible moves in a greedy fashion. Although this heuristic is fast, it does not always return a closed tour. Also, it only works on boards of certain dimensions. Due to its greedy behaviour, it can get stuck into a local optimum easily. That is similar to the other existing approaches. Our goal in this thesis is to come up with a new strategy based on reinforcement learning. Ideally, it should be able to find a closed tour on chessboards of any size. We will consider several approaches: value-based methods, policy optimization and actor-critic methods. Compared to previous work, our approach is non-deterministic and sees the problem as a single-player game with a tradeoff between exploration and exploitation. We will evaluate the effectiveness and efficiency of the existing methods and new heuristics.

Freitag, 17. März 2023, 11:30 Uhr

iCal (Download)
Ort: Raum 348 (Gebäude 50.34)

Vortragende(r)	Rakan Al Masri
Titel	Generating Causal Domain Knowledge for Cloud Systems Monitoring
Vortragstyp	Bachelorarbeit
Betreuer(in)	Pawel Bielski
Vortragsmodus	in Präsenz
Kurzfassung	While standard machine learning approaches rely solely on data to learn relevant patterns, in certain fields, this may not be sufficient. Researchers in the Healthcare domain, have successfully applied causal domain knowledge to improve prediction quality of machine learning models, especially for rare diseases. The causal domain knowledge informs the machine learning model about similar diseases, thus improving the quality of the predictions. However, some domains, such as Cloud Systems Monitoring, lack readily available causal domain knowledge, and thus the knowledge must be approximated. Therefore, it is important to have a systematic investigation of the processes and design decision that affect the knowledge generation process. In this study, we showed how causal discovery algorithms can be employed to generate causal domain knowledge from raw textual logs in the Cloud Systems Monitoring domain. We also investigated the impact of various design choices on the domain knowledge generation process through systematic testing across multiple datasets and shared the insights we gained. To our knowledge, this is the first time such an investigation has been conducted.

Freitag, 24. März 2023, 11:30 Uhr