Dokument: Detection of functional modules in genomic and metagenomic datasets

Titel:Detection of functional modules in genomic and metagenomic datasets
Weiterer Titel:Detektion funktioneller Module in genomischen und metagenomischen Datensätzen
URL für Lesezeichen:https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=38592
URN (NBN):urn:nbn:de:hbz:061-20160620-144052-7
Kollektion:Dissertationen
Sprache:Englisch
Dokumententyp:Wissenschaftliche Abschlussarbeiten » Dissertation
Medientyp:Text
Autor:Dr. Konietzny, Sebastian Gil Anthony [Autor]
Dateien:
[Dateien anzeigen]OpenOffice Textdokument, Adobe PDF, ZIP Archiv, Microsoft Word, Unbekannter Dateityp, PNG Grafik
[Details]24,92 MB in 31 Dateien
[ZIP-Datei erzeugen]
Dateien vom 12.06.2016 / geändert 13.06.2016
Beitragende:Prof. Dr. McHardy, Alice [Gutachter]
Prof. Dr. Lercher, Martin [Gutachter]
Stichwörter:latent dirichlet allocation, probabilistic topic models, metabolic pathways, metagenomes
Dewey Dezimal-Klassifikation:000 Informatik, Informationswissenschaft, allgemeine Werke » 004 Datenverarbeitung; Informatik
Beschreibung:Cellular processes typically correspond to one or more functional modules, which represent groups of functionally interacting proteins. Common examples of functional modules are metabolic pathways, protein complexes, and signal transduction chains. Studying the composition of functional modules is an important challenge because it paves the way to exploiting microbial proteins for improvements of biotechnological techniques. The problem here is to identify interacting proteins given only their gene sequences, and to understand the cross-effects between individual protein functions.

Proteins are encoded in the genes of organisms, and result as products of gene expression. With modern DNA sequencing techniques, it became a highly automated and relatively cheap process to access the gene repertoires (‘genomes’) of organisms. As a consequence, thousands of sequenced genomes became available in public databases, and the numbers are rapidly increasing. Moreover, modern techniques enabled metagenome studies of microbial communities, i.e. the sequencing of environmental DNA probes without the need of cultivating organisms in the laboratories. A so-called metagenome thus represents the mixed genetic material of a microbial community of species.

A common approach for detecting interacting proteins is referred to as phylogenetic profiling. Its basic assumption is that functionally coupled genes tend to co-evolve, which suggests that protein-protein interactions (PPIs) are detectable from gene co-occurrence patterns across sets of genomes. This principle enables a computational identification of pairwise interactions of proteins and of groups of interacting proteins.

The key challenge of this PhD project was to develop new machine-learning-based methods for the computational detection of functional modules based on the principles of phylogenetic profiling. Notably, only a few previous studies had analyzed the applicability of standard phylogenetic profiling methods on large collections of genomes before, and the analysis of metagenomic datasets was largely untouched.

The author’s main scientific contributions are the development and evaluation of two new methods for functional module inference for genomic and metagenomic input datasets (Konietzny et al., 2011, 2014). These methods are based on probabilistic topic models, which originally stem from the field of text mining, and the idea of applying such models to gene sets of (meta)genomes is new. Topic models are Bayesian graphical models which are known to be robust against noise in the input data. This property is important for the analysis of gene presence/absence patterns because currently available methods for DNA sequencing and gene prediction can produce erroneous outputs. Moreover, the newly developed methods discussed in this thesis enable the identification of genomic elements, that is, proteins and entire functional modules that are linked to specific capabilities of cells (‘phenotypic traits’ of organisms, or ‘phenotype’ for short). Therefore, they represent valuable instruments for the identification of biocatalysts from microbes which might enable innovations in biotechnology and medical health care.
Lizenz:In Copyright
Urheberrechtsschutz
Fachbereich / Einrichtung:Mathematisch- Naturwissenschaftliche Fakultät » WE Informatik » Bioinformatik
Dokument erstellt am:20.06.2016
Dateien geändert am:20.06.2016
Promotionsantrag am:26.11.2015
Datum der Promotion:09.05.2016
english
Benutzer
Status: Gast
Aktionen