Dokument: Statistical Machine Translation Beyond Context-Free Grammar

Titel:

Statistical Machine Translation Beyond Context-Free Grammar

URL für Lesezeichen:

https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=48356

URN (NBN):

urn:nbn:de:hbz:061-20190319-135935-3

Kollektion:

Dissertationen

Sprache:

Englisch

Dokumententyp:

Wissenschaftliche Abschlussarbeiten » Dissertation

Medientyp:

Text

Autor:

Käshammer, Miriam [Autor]

Dateien:

[Dateien anzeigen]	Adobe PDF
[Details]	1,07 MB in einer Datei
[ZIP-Datei erzeugen]
Dateien vom 24.01.2019 / geändert 24.01.2019

Beitragende:

Prof. Dr. Laura Kallmeyer [Gutachter]
Jun.-Prof. Petersen, Wiebke [Gutachter]

Dewey Dezimal-Klassifikation:

400 Sprache » 410 Linguistik

Beschreibung:

Statistical machine translation (SMT) has evolved from simple word-based models over feature-rich phrase-based approaches to tree-based methods. The latter employ synchronous grammars, usually some form of a Synchronous Context-Free Grammar (SCFG), to model hierarchical as well as translational relationships between language pairs.
In this thesis, an approach to tree-based SMT is explored which makes use of a grammar formalism beyond Context-Free Grammar (CFG). I define and implement the first SMT system based on Linear Context-Free Rewriting System (LCFRS), including training procedures and a cube-pruning decoder. At the same time, it is also the first hierarchical phrase-based system which allows for discontinuous phrases on the source as well as on the target side.
To that end, I define Synchronous Linear Context-Free Rewriting System (SLCFRS), a natural extension to SCFG. SLCFRS non-terminals may span more than one continuous block on each side of the bitext and can thus represent synchronous discontinuous constituents in a straightforward manner. In the domain of data-driven syntactic parsing, LCFRS is a well-studied formalism for modeling discontinuities while still being fairly efficient to handle. Experiments for translating from German to English demonstrate the feasibility of training and decoding with more expressive translation models such as SLCFRS and show a modest improvement over a context-free baseline.
The extension beyond context-freeness in the context of machine translation is motivated by a set of alignment configurations that are beyond the alignment capacity of current translation models based on SCFG. In quantitative and qualitative investigations, I show that an SCFG-based approach to translation modeling is not capable of deriving all alignments which occur in a wide range of manually aligned bilingual data sets, and that only very few of those configurations can be attributed to alignment errors. In order to not a priori exclude the corresponding translation options from the search space, a more expressive grammar formalism than SCFG is required in the context of tree-based translation.

Lizenz:

Urheberrechtsschutz

Fachbereich / Einrichtung:

Philosophische Fakultät » Institut für Sprache und Information » Computerlinguistik

Dokument erstellt am:

19.03.2019

Dateien geändert am:

19.03.2019

Promotionsantrag am:

16.05.2018

Datum der Promotion:

30.11.2018

Heinrich-Heine-Universität Düsseldorf

Dokument: Statistical Machine Translation Beyond Context-Free Grammar