Dokument: Automatic Identification and Disambiguation of Verbal Multiword Expressions

Titel:Automatic Identification and Disambiguation of Verbal Multiword Expressions
URL für Lesezeichen:https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=70248
URN (NBN):urn:nbn:de:hbz:061-20250728-125900-9
Kollektion:Dissertationen
Sprache:Englisch
Dokumententyp:Wissenschaftliche Abschlussarbeiten » Dissertation
Medientyp:Text
Autor: Ehren, Rafael [Autor]
Dateien:
[Dateien anzeigen]Adobe PDF
[Details]1,79 MB in einer Datei
[ZIP-Datei erzeugen]
Dateien vom 18.07.2025 / geändert 18.07.2025
Beitragende:Prof. Dr. Kallmeyer, Laura [Gutachter]
Prof. Dr. Petersen, Wiebke [Gutachter]
Lichte, Timm [Gutachter]
Stichwörter:Multiword Expressions, Potentially Idiomatic Expressions, Identification, Disambiguation
Dewey Dezimal-Klassifikation:400 Sprache » 410 Linguistik
Beschreibung:The primary topics of this thesis are Multiword Expression (MWE) identification and its subtask the disambiguation of potentially idiomatic expressions (PIEs), with the main focus lying on the latter. One of our main contributions is the creation of a German PIE corpus that can be used for the supervised training of classifiers capable of distinguishing between MWE instances and their literal counterparts. This corpus is then used for exactly that purpose in a variety of experiments. In these experiments, we test how well a model performs that is based on contextualizing a PIE’s components before classification and whether we find clues that certain types of word embeddings capture morphosyntactic properties which could help during classification. Furthermore, we explore the use of an attention mechanism to uncover which parts of the input the system actually focuses on during classification and whether this corresponds to clues a human annotator would rely on. Finally, we will try to leverage the generating capabilities of a large language model to augment our PIE corpus with more data. Regarding MWE identification, we employ a BiLSTM coupled with a binary labeling scheme and a heuristic that converts them back to PARSEME-style labels. Then, we tackle the issue of overlapping MWE components by training individual classifiers for different MWE types.
Lizenz:Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz
Fachbereich / Einrichtung:Philosophische Fakultät » Institut für Sprache und Information » Computerlinguistik
Philosophische Fakultät » Institut für Sprache und Information » Allgemeine Sprachwissenschaft
Dokument erstellt am:28.07.2025
Dateien geändert am:28.07.2025
Promotionsantrag am:06.06.2024
Datum der Promotion:07.11.2024
english
Benutzer
Status: Gast
Aktionen