Automated Generation of a Consistency Benchmark for Cyber-Physical Systems Modeling

Ausschreibung (Liste aller Ausschreibungen)
	Typ	Bachelorarbeit oder Masterarbeit
	Aushang
	Betreuer	Wenden Sie sich bei Interesse oder Fragen bitte an: Rahul Sharma (E-Mail: rahul.sharma@kit.edu)

Motivationt

The design of complex Cyber-Physical Systems (CPS) relies on a suite of diverse modeling platforms like SysML and Simulink. A critical and unsolved challenge in this paradigm is ensuring the consistency of models across these platforms, as inconsistencies can lead to system failures. While machine learning, particularly Large Language Models, shows promise for automating this consistency checking, its development is blocked by a fundamental problem: the total lack of a large-scale, high-quality, and publicly available benchmark dataset. This thesis aims to solve this problem by designing and implementing a framework to automatically generate a rich dataset of consistent and inconsistent CPS model pairs.

Research Questions:

This thesis may address the following key research questions:

How can the complex, proprietary structures of SysML (XMI) and Simulink (.slx) models be reliably parsed and canonicalized into a standardized intermediate graph representation? What is a minimal, yet expressive, graph schema that can capture the core semantics of components, parameters, and connections from both platforms? What types of semantic faults are most common and critical in CPS design (e.g., data type mismatches, parameter drift, broken links)? How can a "fault injection" engine be designed to programmatically introduce these semantic faults into "golden reference" models to create a diverse and realistic dataset of inconsistent pairs?

Proposed Methodology

Phase 1: Model Parser Development: Implement robust parsers in Python for SysML (using lxml or similar for XMI) and Simulink (using the MATLAB Engine API or manual .slx XML parsing) to extract their complete structural and parametric data.

Phase 2: Graph Representation Design: Design a standardized graph schema (e.g., using NetworkX) that serves as a "lingua franca" for the models. This schema will represent all model elements (blocks, ports, connectors) as nodes and edges with rich attribute metadata (parameters, types, documentation).

Phase 3: Fault Injection Engine: Curate a "golden set" of 5-10 open-source CPS models. Then, develop a fault injection module that operates on the graph representation. This module will programmatically introduce a wide array of single-point semantic faults (e.g., modifying a parameter value, changing a data type, deleting a connector) to create a large set of "negative" (inconsistent) samples.

Phase 4: Dataset Generation & Validation: Combine the positive (golden) and negative (faulty) pairs to generate a large-scale benchmark. The dataset will be validated by testing its utility in training a simple baseline consistency checker.

Expected Contributions & Deliverables

A set of open-source parsers for SysML and Simulink models. A well-documented graph serialization format for CPS models. The fault injection engine capable of programmatically creating inconsistent model pairs. A public, large-scale benchmark dataset (e.g., 2,000+ model pairs) for training and evaluating CPS consistency-checking models.