Attribut:Kurzfassung

Aus SDQ-Institutsseminar

Dies ist ein Attribut des Datentyps Text.

Unterhalb werden 20 Seiten angezeigt, auf denen für dieses Attribut ein Datenwert gespeichert wurde.
C
Algorithms that extract dependencies from data and represent them as causal graphs must also be tested. For such tests, data with a known ground truth is required, but this is rarely available. Generating data under controlled conditions through simulations is expensive and time-consuming. A solution to this problem is to create synthetic datasets, where dependencies are predefined, to evaluate the results of these algorithms. This work focuses on building a framework for the synthesis of data. In the framework, the synthesis process begins with generating a random dependency graph, specifically a directed acyclic graph. Each node in the graph, except the source nodes, has parent nodes and represents a variable. In the next step, each node is populated with predefined random dependencies. A dependency is a model that determines the value of a variable based on its parent variables. From this structure, datasets can be sampled. Users can control the properties of the causal graph through various parameters and choose from multiple types of dependencies, representing different complexity levels. Additionally, the sampling process allows for interactivity by enabling the exchange of dependencies during the sampling process. Dependencies can be exchanged with fixed values, probability distributions, or time series functions. This flexibility provides a robust tool for improving and comparing the mentioned algorithms under various conditions.  +
Particle colliders are a primary method of conducting experiments in particle physics, as they allow to both create short-lived, high-energy particles and observe their properties. The world’s largest particle collider, the Large Hadron Collider (subsequently referred to as LHC), is operated by the European Organization for Nuclear Research (CERN) near Geneva. The operation of this kind of accelerator requires the storage and computationally intensive analysis of large amounts of data. The Worldwide LHC Computing Grid (WLCG), a global computing grid, is being run alongside the LHC to serve this purpose. This Bachelor’s thesis aims to support the creation of an architecture model and simulation for parts of the WLCG infrastructure with the goal of accurately being able to simulate and predict changes in the infrastructure such as the replacement of the load balancing strategies used to distribute the workload between available nodes.  +
Dependency estimation is a crucial task in data analysis and finds applications in, e.g., data understanding, feature selection and clustering. This thesis focuses on Canonical Dependency Analysis, i.e., the task of estimating the dependency between two random vectors, each consisting of an arbitrary amount of random variables. This task is particularly difficult when (1) the dimensionality of those vectors is high, and (2) the dependency is non-linear. We propose Canonical Monte Carlo Dependency Estimation (cMCDE), an extension of Monte Carlo Dependency Estimation (MCDE, Fouché 2019) to solve this task. Using Monte Carlo simulations, cMCDE estimates dependency based on the average discrepancy between empirical conditional distributions. We show that cMCDE inherits the useful properties of MCDE and compare it to existing competitors. We also propose and apply a method to leverage cMCDE for selecting features from very high-dimensional features spaces, demonstrating cMCDE’s practical relevance.  +
Im Laufe der Zeit hat sich die Softwareentwicklung von der Entwicklung von Komplett-systemen zur Entwicklung von Software Komponenten, die in andere Applikation inte- griert werden können,verändert.Bei Software Komponenten handelt es sich um Services, die eine andere Applikation erweitern.Die Applikation wird dabeivonDritten entwickelt. In dieser Bachelorthesis werden die Probleme betrachtet, die bei der Integration von Ser- vices auftreten. Mit einer Umfrage wird das Entwicklungsteam von LogMeIn, welches für die Integration von Services zuständig ist, befragt. Aus deren Erfahrungen werden Probleme ausfndig gemacht und Lösungen dafür entwickelt. Die Probleme und Lösungen werden herausgearbeitet und an hand eines fort laufenden Beispiels, des GoToMeeting Add-ons für den Google Kalender,veranschaulicht.Für die Evaluation wird eine Fallstudie durchgeführt, in der eine GoToMeeting Integration für Slack entwickelt wird. Während dieser Entwicklung treten nicht alle ausgearbeiteten Probleme auf. Jedoch können die Probleme, die auftreten mit den entwickelten Lösungen gelöst werden. Zusätzlich tritt ein neues Problem auf, für das eine neue Lösung entwickelt wird. Das Problem und die zugehörige Lösung werden anschließend zu dem bestehenden Set von Problemen und Lösungen hinzugefügt. Das Hinzufügen des gefundenen Problems ist ein perfektes Beispiel dafür, wie das Set in Zukunft bei neuen Problemen, erweitert werden kann.  +
The data collected in many real-world scenarios such as environmental analysis, manufacturing, and e-commerce are high-dimensional and come as a stream, i.e., data properties evolve over time – a phenomenon known as "concept drift". This brings numerous challenges: data-driven models become outdated, and one is typically interested in detecting specific events, e.g., the critical wear and tear of industrial machines. Hence, it is crucial to detect change, i.e., concept drift, to design a reliable and adaptive predictive system for streaming data. However, existing techniques can only detect "when" a drift occurs and neglect the fact that various drifts may occur in different dimensions, i.e., they do not detect "where" a drift occurs. This is particularly problematic when data streams are high-dimensional. The goal of this Master’s thesis is to develop and evaluate a framework to efficiently and effectively detect “when” and “where” concept drift occurs in high-dimensional data streams. We introduce stream autoencoder windowing (SAW), an approach based on the online training of an autoencoder, while monitoring its reconstruction error via a sliding window of adaptive size. We will evaluate the performance of our method against synthetic data, in which the characteristics of drifts are known. We then show how our method improves the accuracy of existing classifiers for predictive systems compared to benchmarks on real data streams.  +
Data streams are ubiquitous in modern applications such as predictive maintenance or quality control. Data streams can change in unpredictable ways, challenging existing supervised learning algorithms that assume a stationary relationship between input data and labels. Supervised learning algorithms for data streams must therefore "adapt" to changing data distributions. Active learning (AL), a sub-field of supervised learning, aims to reduce the total cost of labeling by identifying the most valuable data points for training. However, existing stream-based AL methods have difficulty adapting to changes in data streams as they rely mainly on the sparsely labeled data and ignore the regionality of changes, resulting in slow change adaptions. To address these issues, this thesis presents an active learning framework for data streams that adapts to regional changes in the underlying data stream. Our idea is to enrich hierarchical data stream clustering with labeling statistics to measure the regionality and relevance of changes. Using such information in stream-based active learning leads to more effective labeling, resulting in faster change adaption.  +
Das Palladio Komponentenmodell (PCM) ermöglicht die Modellierung und Simulation der Qualitätseigenschaften eines Systems aus komponentenbasierter Software und für die Ausführung gewählter Hardware. Stehen dabei bereits Teile des Systems zur Verfügung können diese in die Co-Simulation von Workload, Software und Hardware integriert werden, um weitere Anwendungsgebiete für das PCM zu ermöglichen oder die Anwendung in bestehenden zu verbessern. Die Beiträge dieser Arbeit sind das Erarbeiten von sechs verschiedenen Ansätzen zur Anpassung des PCM für unterschiedliche Anwendungsgebiete und deren Einstufung anhand von Bewertungskriterien. Für den dabei vielversprechendsten Ansatz wurde ein detailliertes Konzept entwickelt und prototypisch umgesetzt. Dieser Ansatz, ein Modell im PCM mittels einer feingranularen Hardwaresimulation zu parametrisieren, wird in Form des Prototyps bezüglich seiner Umsetzbarkeit, Erweiterbarkeit und Vollständigkeit evaluiert. Die Evaluation der prototypischen Umsetzung erfolgt unter anderem anhand der Kriterien Benutzbarkeit, Genauigkeit und Performance, die in Relation zum PCM betrachtet werden. Der Prototyp ermöglicht die Ausführung einer Hardwaresimulation mit im PCM spezifizierten Parametern, die Extraktion dabei gemessener Leistungsmerkmale und deren direkte Verwendung in einer Simulation des PCM.  +
In data analysis, entity matching (EM) or entity resolution is the task of finding the same entity within different data sources. When joining different data sets, it is a required step where the same entities may not always share a common identifier. When applied to graph data like knowledge graphs, ontologies, or abstractions of physical systems, the additional challenge of entity relationships comes into play. Now, not just the entities themselves but also their relationships and, therefore, their neighborhoods need to match. These relationships can also be used to our advantage, which builds the foundation for collective entity matching (CEM). In this bachelor thesis, we focus on a graph data set based on a material simulation with the intent to match entities between neighboring system states. The goal is to identify structures that evolve over time and link their states with a common identifier. Current CEM Algorithms assume perfect matches to be possible, i.e., every entity can be matched. We want to overcome this challenge and address the high imbalance of potential candidates and impossible matches. A third major challenge is the large volumes of data which requires our algorithm to be efficient.  +
An der Entwicklung komplexer Systeme sind viele Teams aus verschiedenen Disziplinen vertreten. So sind zum Beispiel an der Entwicklung einer sicherheitskritischen Systemarchitektur mindestens ein Systemarchitekt als auch ein Sicherheitsexperte beteiligt. Die Aufgabe des ersteren ist es, eine Systemarchitektur zu entwickeln, welche alle funktionalen und nicht-funktionalen Anforderungen erfüllt. Der Sicherheitsexperte analysiert diese Architektur und trägt so zum Nachweis bei, dass das System die geforderten Sicherheitsanforderungen erfüllt. Sicherheit steht hierbei für die Gefahrlosigkeit des Nutzers und der Umwelt durch das System (Safety). Um ihr Ziel zu erreichen, folgen sowohl der Systemarchitekt als auch der Sicherheitsexperte einem eigenen Vorgehensmodell. Aufgrund fehlender Interaktionspunkte müssen beide unabhängig voneinander und unkoordiniert durchgeführt werden. Dies kann zu Inkonsistenzen zwischen Architektur- und Sicherheitsartefakten führen und zusätzlichen Aufwand verursachen, was sich wiederum negativ auf die Entwicklungszeit und Qualität auswirkt. In dieser Arbeit kombinieren wir zwei ausgewählte Vorgehensmodelle zu einem neuen, einzelnen Vorgehensmodell. Die Kombination erfolgt auf Basis des identifizierten Informationsflusses innerhalb und zwischen den ursprünglichen zwei Vorgehensmodellen. Durch die Kombination werden die Vorteile beider Ansätze übernommen und die zuvor genannten Probleme angegangen. Bei den zwei ausgewählten Vorgehensmodellen handelt es sich um den Harmony-Ansatz von IBM und die ISO-Norm 26262. Ersterer erlaubt es eine Systemarchitektur systematisch und modellbasiert mit SysML zu entwickeln, während die ISO-Norm dem Sicherheitsexperten bei seiner Arbeit bezüglich der funktionalen Sicherheit in Straßenfahrzeugen unterstützt. Die Evaluation unseres Ansatzes zeigt dessen Anwendbarkeit im Rahmen einer realen Fallstudie. Außerdem werden dessen Vorteile bezüglich Konsistenz zwischen Architektur- und Sicherheitsartefakten und Durchführungszeit diskutiert, basierend auf einem Vergleich mit ähnlichen Ansätzen.  
Architecture-level performance models, for instance, the PCM, allow performance predictions to evaluate and compare design alternatives. However, software architectures drift over time so that initially created performance models are out-to-date fast due to the required manual high effort to keep them up-to-date. To close the gap between the development and having up-to-date performance models, the Continuous Integration of Performance Models (CIPM) approach has been proposed. It incorporates automatically executed activities into a Continuous Integration pipeline and is realized with Vitruvius combining Java and the PCM. As a consequence, changes from a commit are extracted to incrementally update the models in the VSUM. To estimate the resource demand in the PCM, the CIPM approach adaptively instruments and monitors the source code. In previous work, parts of the CIPM pipeline were prototypically implemented and partly evaluated with artificial projects. A pipeline combining the incremental model update and the adaptive instrumentation is absent. Therefore, this thesis presents the combined pipeline adapting and extending the existing implementations. The evaluation is performed with the TeaStore and indicates the correct model update and instrumentation. Nevertheless, there is a gap towards the calibration pipeline.  +
Die sich weiterentwickelnde Technologie erfordert eine organisationsübergreifende Zusammenarbeit, um die komplexen Aufgaben bei der Entwicklung komplexer Systeme wie Webservices, IoT und heterogener Systeme zu bewältigen. Auf diese Weise hat die Zusammenarbeit das Potenzial, den hohen Ressourcen- und Qualitätsbedarf zu mindern und von den komplementären Ressourcen der Zusammenarbeit zu profitieren. Dennoch muss während der Zusammenarbeit ein Austausch der Versionen der entwickelten Schnittstellen möglich sein. Während des Austauschs muss der Schutz des geistigen Eigentums der kooperierenden Organisationen und der Metadaten der Versionen berücksichtigt werden, um die Kompetenz der Organisation und die Designentscheidungen zu schützen. Einige existierende Ansätze integrieren einen Server, um die Zugangskontrolle und Autorisierung zu erleichtern und eine feingranulare Verschlüsselung der gemeinsam genutzten Assets und den Schutz des geistigen Eigentums zu bieten. Andere Ansätze integrieren Verschlüsselungsmechanismen in die bestehenden Versionskontrollsysteme, z. B. Git. Für die Zugriffskontrolle nutzen wir jedoch keinen Server und schützen das geistige Eigentum und zusätzlich die Metadaten. Das Thema dieser Arbeit konzentriert sich auf die Vertraulichkeit und Integrität der Modellversionen und ihrer Metadaten. Wir definieren Deltachain, die wir aus Blockchain motivieren. Jedes Element einer Deltachain ist mit einem Vorgänger verknüpft, wodurch eine Kette entsteht. Die Elemente von Deltachain bestehen aus verschlüsselten Modelländerungen mit den entsprechenden Metadaten, um die Vertraulichkeit zu gewährleisten. Darüber hinaus kapselt jedes Element von Deltachain den Hash-Wert der Modelländerung und der Metadaten ein, um die Integrität zu gewährleisten. Wir verwenden den Advanced Encryption Standard für die Verschlüsselung und SHA für das Hashing der Versionen. Um Modelländerungen zwischen zwei Versionen eines Modells zu erkennen, verwenden wir die Vitruv-Tools. Für die Definition des Metamodells und die Implementierung der Deltachain verwenden wir das Eclipse Modeling Framework. Wir evaluieren unseren Ansatz unter den Aspekten der Funktionalität, um die erwartete Leistung von Deltachain zu bestätigen, der Skalierbarkeit, um die Auswirkungen der zunehmenden Versionen und Modelle auf Deltachain zu untersuchen, und der Leistung, um Deltachain mit EMFCompare zu vergleichen. Für die Bewertungsklassen Skalierbarkeit und Leistung finden wir drei Open-Source-Modellierungsprojekte auf GitHub.  
The passing of new regulations like the European GDPR has clarified that in the future it will be necessary to build privacy-preserving systems to protect the personal data of its users. This thesis will introduce the concept of privacy templates to help software designers and architects in this matter. Privacy templates are at their core similar to design patterns and provide reusable and general architectural structures which can be used in the design of systems to improve privacy in early stages of design. In this thesis we will conceptualize a small collection of privacy templates to make it easier to design privacy-preserving software systems. Furthermore, the privacy templates will be categorized and evaluated to classify them and assess their quality across different quality dimensions.  +
Im Zeitalter des Cloud Computings und der Big Data existieren Software-Telemetriedaten im Überfluß. Die schiere Menge an Daten und Datenplattformen kann allerdings zu Problemen in ihrer Handhabung führen. In dieser Masterarbeit wird ein Laufzeitmodell vorgestellt, welches es ermöglicht, Messungen von Telemetriedaten auf verschiedenen Datenplatformen durchzuführen. Hierbei folgt das Modell dem vollen Lebenszyklus einer Messung von der Definition durch eine eigens hierfür entwickelte domänenspezifischen Sprache, bis zur Visualisierung der resultierenden Messwerte. Das Modell wurde bei dem Software-as-a-Service-Unternehmen LogMeIn implementiert und getestet. Hierbei wurde eine Evaluation hinsichtlich der Akzeptanz des implementierten Dienstes bei der vermuteten Zielgruppe anhand einer Nutzerstudie innerhalb des Unternehmens durchgeführt.  +
While large language models have succeeded in generating code, the struggle is to modify large existing code bases. The Generated Code Alteration (GCA) process is designed, implemented, and evaluated in this thesis. The GCA process can automatically modify a large existing code base, given a natural language task. Different variations and instantiations of the process are evaluated in an industrial case study. The code generated by the GCA process is compared to code written by human developers. The language model-based GCA process was able to generate 13.3 lines per error, while the human baseline generated 65.8 lines per error. While the generated code did not match the overall human performance in modifying large code bases, it could still provide assistance to human developers.  +
In Industry 4.0 environments highly dynamic and flexible access control strategies are needed. State of the art strategies are often not included in the modelling process but must be considered afterwards. This makes it very difficult to analyse the security properties of a system. In the framework of the Trust 4.0 project the confidentiality analysis tries to solve this problem using a context-based approach. Thus, there is a security model named “context metamodel”. Another important problem is that the transformation of an instance of a security model to a wide-spread access control standard is often not possible. This is also the case for the context metamodel. Moreover, another transformation which is very interesting to consider is one to an ensemble based component system which is also presented in the Trust 4.0 project. This thesis introduces an extension to the beforementioned context metamodel in order to add more extensibility to it. Furthermore, the thesis deals with the creation of a concept and an implementation of the transformations mentioned above. For that purpose, at first, the transformation to the attribute-based access control standard XACML is considered. Thereafter, the transformation from XACML to an ensemble based component system is covered. The evaluation indicated that the model can be used for use cases in Industry 4.0 scenarios. Moreover, it also indicated the transformations produce adequately accurate access policies. Furthermore, the scalability evaluation indicated linear runtime behaviour of the implementations of both transformations for respectively higher number of input contexts or XACML rules.  +
Architecture-level performance models of software like the PCM can aid with the development of the software by preventing architecture degradation and helping to diagnose performance issues during the implementation phase. Previously, manual intervention was required to create and update such models. The CIPM approach can be employed to automatically make a calibrated PCM instance available during the development of software. A prototypical implementation of the CIPM approach targets microservice-based web applications implemented in Java. No implementations for other programming languages exist and the process of adapting the CIPM approach to support another programming language has previously not been explored. We present an approach to adapting CIPM to support Lua-based sensor applications. A prototypical implementation of the adapted approach was evaluated using real-world Lua-based sensor applications from the SICK AppSpace ecosystem. The evaluation demonstrates the feasibility of the adapted approach, but also reveals minor technical issues with the implementation.  +
In software engineering, software architecture documentation plays an important role. It contains many essential information regarding reasoning and design decisions. Therefore, many activities are proposed to deal with documentation for various reasons, e.g., extract- ing information or keeping different forms of documentation consistent. These activities often involve automatic processing of documentation, for example traceability link recovery (TLR). However, there can be problems for automatic processing when coreferences are present in documentation. A coreference occurs when two or more mentions refer to the same entity. These mentions can be different and create ambiguities, for example when there are pronouns. To overcome this problem, this thesis proposes two contributions to resolve coreferences in software architecture documentation. The first contribution is to explore the performance of existing coreference resolution models for software architecture documentation. The second is to divide coreference resolution into many more specific type of resolutions, like pronoun resolution, abbreviation resolution, etc.  +
To evaluate the loss of cognitive ML models, e.g., text or image classifier, accurately, one usually needs a lot of test data which are annotated manually by experts. In order to estimate accurately, the test data should be representative or else it would be hard to assess whether a model overfits, i.e., it uses spurious features of the images significantly to decide on its predictions.With techniques such as Feature Attribution, one can then compare important features that the model sees with their own expectations and can therefore be more confident whether or not he should trust the model. In this work, we propose a method that estimates the loss of image classifiers based on Feature-Attribution techniques. We use the classic approach for loss estimate as our benchmark to evaluate our proposed method. At the end of this work, our analysis reveals that our proposed method seems to have a similar loss estimate to that of the classic approach with a good image classifer and a representative test data. Based on our experiment, we expect that our proposed method could give a better loss estimate than the classic approach in cases where one has a biased test data and an image classifier which overfits.  +
Conventional evaluation of an ML classifier uses test data to estimate its expected loss. For "cognitive" ML tasks like image or text classification, this requires that experts annotate a large and representative test data set, which can be expensive. In this thesis, we explore another approach for estimating the expected loss of an ML classifier. The aim is to enhance test data with additional expert knowledge. Inspired by recent feature attribution techniques, such as LIME or Saliency Maps, the idea is that experts annotate inputs not only with desired classes, but also with desired feature attributions. We then explore different methods to derive a large conventional test data set based on few such feature attribution annotations. We empirically evaluate the loss estimates of our approach against ground-truth estimates on large existing test data sets, with a focus on the tradeoff between the number of expert annotations and the achieved estimation accuracy.  +