Beyond Similarity - Dimensions of Semantics and How to Detect them: Unterschied zwischen den Versionen

Aktuelle Version vom 2. Januar 2023, 11:12 Uhr

Vortragende(r)	Felix Pieper
Vortragstyp	Masterarbeit
Betreuer(in)	Sophie Corallo
Termin	Fr 13. Januar 2023
Vortragsmodus	in Präsenz
Kurzfassung	Semantic similarity estimation is a widely used and well-researched area. Current state-of-the-art approaches estimate text similarity with large language models. However, semantic similarity estimation often ignores fine-grain differences between semantic similar sentences. This thesis proposes the concept of semantic dimensions to represent fine-grain differences between two sentences. A workshop with domain experts identified ten semantic dimensions. From the workshop insights, a model for semantic dimensions was created. Afterward, 60 participants decided via a survey which semantic dimensions are useful to users. Detectors for the five most useful semantic dimensions were implemented in an extendable framework. To evaluate the semantic dimensions detectors, a dataset of 200 sentence pairs was created. The detectors reached an average F1 score of 0.815.

@@ Zeile 6: / Zeile 6: @@
 |termin=Institutsseminar/2023-01-13
 |vortragsmodus=in Präsenz
+|kurzfassung=Semantic similarity estimation is a widely used and well-researched area. Current state-of-the-art approaches estimate text similarity with large language models. However, semantic similarity estimation often ignores fine-grain differences between semantic similar sentences. This thesis proposes the concept of semantic dimensions to represent fine-grain differences between two sentences. A workshop with domain experts identified ten semantic dimensions. From the workshop insights, a model for semantic dimensions was created. Afterward, 60 participants decided via a survey which semantic dimensions are useful to users. Detectors for the five most useful semantic dimensions were implemented in an extendable framework. To evaluate the semantic dimensions detectors, a dataset of 200 sentence pairs was created. The detectors reached an average F1 score of 0.815.
 }}