Dokument: Statistical Machine Translation Beyond Context-Free Grammar
Titel: | Statistical Machine Translation Beyond Context-Free Grammar | |||||||
URL für Lesezeichen: | https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=48356 | |||||||
URN (NBN): | urn:nbn:de:hbz:061-20190319-135935-3 | |||||||
Kollektion: | Dissertationen | |||||||
Sprache: | Englisch | |||||||
Dokumententyp: | Wissenschaftliche Abschlussarbeiten » Dissertation | |||||||
Medientyp: | Text | |||||||
Autor: | Käshammer, Miriam [Autor] | |||||||
Dateien: |
| |||||||
Beitragende: | Prof. Dr. Laura Kallmeyer [Gutachter] Jun.-Prof. Petersen, Wiebke [Gutachter] | |||||||
Dewey Dezimal-Klassifikation: | 400 Sprache » 410 Linguistik | |||||||
Beschreibung: | Statistical machine translation (SMT) has evolved from simple word-based models over feature-rich phrase-based approaches to tree-based methods. The latter employ synchronous grammars, usually some form of a Synchronous Context-Free Grammar (SCFG), to model hierarchical as well as translational relationships between language pairs.
In this thesis, an approach to tree-based SMT is explored which makes use of a grammar formalism beyond Context-Free Grammar (CFG). I define and implement the first SMT system based on Linear Context-Free Rewriting System (LCFRS), including training procedures and a cube-pruning decoder. At the same time, it is also the first hierarchical phrase-based system which allows for discontinuous phrases on the source as well as on the target side. To that end, I define Synchronous Linear Context-Free Rewriting System (SLCFRS), a natural extension to SCFG. SLCFRS non-terminals may span more than one continuous block on each side of the bitext and can thus represent synchronous discontinuous constituents in a straightforward manner. In the domain of data-driven syntactic parsing, LCFRS is a well-studied formalism for modeling discontinuities while still being fairly efficient to handle. Experiments for translating from German to English demonstrate the feasibility of training and decoding with more expressive translation models such as SLCFRS and show a modest improvement over a context-free baseline. The extension beyond context-freeness in the context of machine translation is motivated by a set of alignment configurations that are beyond the alignment capacity of current translation models based on SCFG. In quantitative and qualitative investigations, I show that an SCFG-based approach to translation modeling is not capable of deriving all alignments which occur in a wide range of manually aligned bilingual data sets, and that only very few of those configurations can be attributed to alignment errors. In order to not a priori exclude the corresponding translation options from the search space, a more expressive grammar formalism than SCFG is required in the context of tree-based translation. | |||||||
Lizenz: | Urheberrechtsschutz | |||||||
Fachbereich / Einrichtung: | Philosophische Fakultät » Institut für Sprache und Information » Computerlinguistik | |||||||
Dokument erstellt am: | 19.03.2019 | |||||||
Dateien geändert am: | 19.03.2019 | |||||||
Promotionsantrag am: | 16.05.2018 | |||||||
Datum der Promotion: | 30.11.2018 |