A comparative study of subgroup discovery methods: Unterschied zwischen den Versionen

Aus SDQ-Institutsseminar
(Die Seite wurde neu angelegt: „{{Vortrag |vortragender=Mohamed Amine Chalghoum |email=uwejw@student.kit.edu |vortragstyp=Bachelorarbeit |betreuer=Vadim Arzamasov |termin=Institutsseminar/202…“)
 
Keine Bearbeitungszusammenfassung
Zeile 5: Zeile 5:
|betreuer=Vadim Arzamasov
|betreuer=Vadim Arzamasov
|termin=Institutsseminar/2021-02-19
|termin=Institutsseminar/2021-02-19
|kurzfassung=TBD
|kurzfassung=Subgroup discovery is a data mining technique that is used to extract interesting relationships in a dataset related to to a target variable. These relationships are described in the form of rules. Multiple SD techniques have been developed over the years.  This thesis establishes a comparative study between a  number of these techniques in order to identify the state-of-the-art methods. It also analyses the effects discretization has on them as a preprocessing step . Furthermore, it investigates the effect of hyperparameter optimization on these methods.
 
Our analysis showed that PRIM, DSSD, Best Interval and FSSD outperformed the other subgroup discovery methods evaluated in this study and are to be considered state-of-the-art  . It also shows that discretization offers an efficiency improvement on methods that do not employ internal discretization. It has a negative impact on the quality of subgroups generated by methods that perform it internally. The results finally demonstrates that Apriori-SD and SD-Algorithm were the most positively affected by the hyperparameter optimization.
}}
}}

Version vom 13. Februar 2021, 11:17 Uhr

Vortragende(r) Mohamed Amine Chalghoum
Vortragstyp Bachelorarbeit
Betreuer(in) Vadim Arzamasov
Termin Fr 19. Februar 2021
Vortragssprache
Vortragsmodus
Kurzfassung Subgroup discovery is a data mining technique that is used to extract interesting relationships in a dataset related to to a target variable. These relationships are described in the form of rules. Multiple SD techniques have been developed over the years. This thesis establishes a comparative study between a number of these techniques in order to identify the state-of-the-art methods. It also analyses the effects discretization has on them as a preprocessing step . Furthermore, it investigates the effect of hyperparameter optimization on these methods.

Our analysis showed that PRIM, DSSD, Best Interval and FSSD outperformed the other subgroup discovery methods evaluated in this study and are to be considered state-of-the-art . It also shows that discretization offers an efficiency improvement on methods that do not employ internal discretization. It has a negative impact on the quality of subgroups generated by methods that perform it internally. The results finally demonstrates that Apriori-SD and SD-Algorithm were the most positively affected by the hyperparameter optimization.