Developing a Database Application to Compare the Google Books Ngram Corpus to German News Corpora

Aus SDQ-Institutsseminar
Vortragende(r) Jamil Bagga
Vortragstyp Proposal
Betreuer(in) Fabian Richter
Termin Fr 7. Juli 2023
Vortragsmodus in Präsenz
Kurzfassung This thesis focuses on the development of a database application that enables a comparative analysis between the Google Books Ngram Corpus(GBNC) and a German news corpora. The GBNC provides a vast collection of books spanning various time periods, while the German news corpora encompass up-to-date linguistic data from news sources. Such comparison aims to uncover insights into language usage patterns, linguistic evolution, and cultural shifts within the German language.

Extracting meaningful insights from the compared corpora requires various linguistic metrics, statistical analyses and visualization techniques. By identifying patterns, trends and linguistic changes we can uncover valuable information on language usage evolution over time. This thesis provides a comprehensive framework for comparing the GBNC to other corpora, showcasing the development of a database application that enables not only valuable linguistic analyses but also shed light on the composition of the GBNC by highlighting linguistic similarities and differences.