Dokument: Automatic Identification and Disambiguation of Verbal Multiword Expressions
Titel: | Automatic Identification and Disambiguation of Verbal Multiword Expressions | |||||||
URL für Lesezeichen: | https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=70248 | |||||||
URN (NBN): | urn:nbn:de:hbz:061-20250728-125900-9 | |||||||
Kollektion: | Dissertationen | |||||||
Sprache: | Englisch | |||||||
Dokumententyp: | Wissenschaftliche Abschlussarbeiten » Dissertation | |||||||
Medientyp: | Text | |||||||
Autor: | Ehren, Rafael [Autor] | |||||||
Dateien: |
| |||||||
Beitragende: | Prof. Dr. Kallmeyer, Laura [Gutachter] Prof. Dr. Petersen, Wiebke [Gutachter] Lichte, Timm [Gutachter] | |||||||
Stichwörter: | Multiword Expressions, Potentially Idiomatic Expressions, Identification, Disambiguation | |||||||
Dewey Dezimal-Klassifikation: | 400 Sprache » 410 Linguistik | |||||||
Beschreibung: | The primary topics of this thesis are Multiword Expression (MWE) identification and its subtask the disambiguation of potentially idiomatic expressions (PIEs), with the main focus lying on the latter. One of our main contributions is the creation of a German PIE corpus that can be used for the supervised training of classifiers capable of distinguishing between MWE instances and their literal counterparts. This corpus is then used for exactly that purpose in a variety of experiments. In these experiments, we test how well a model performs that is based on contextualizing a PIE’s components before classification and whether we find clues that certain types of word embeddings capture morphosyntactic properties which could help during classification. Furthermore, we explore the use of an attention mechanism to uncover which parts of the input the system actually focuses on during classification and whether this corresponds to clues a human annotator would rely on. Finally, we will try to leverage the generating capabilities of a large language model to augment our PIE corpus with more data. Regarding MWE identification, we employ a BiLSTM coupled with a binary labeling scheme and a heuristic that converts them back to PARSEME-style labels. Then, we tackle the issue of overlapping MWE components by training individual classifiers for different MWE types. | |||||||
Lizenz: | ![]() Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz | |||||||
Fachbereich / Einrichtung: | Philosophische Fakultät » Institut für Sprache und Information » Computerlinguistik Philosophische Fakultät » Institut für Sprache und Information » Allgemeine Sprachwissenschaft | |||||||
Dokument erstellt am: | 28.07.2025 | |||||||
Dateien geändert am: | 28.07.2025 | |||||||
Promotionsantrag am: | 06.06.2024 | |||||||
Datum der Promotion: | 07.11.2024 |