Dokument: Combining variables in clinical data using statistical ensemble methods
Titel: | Combining variables in clinical data using statistical ensemble methods | |||||||
Weiterer Titel: | Kombination klinischer Variablen unter Verwendung von statistischen Ensemble-Methoden | |||||||
URL für Lesezeichen: | https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=59330 | |||||||
URN (NBN): | urn:nbn:de:hbz:061-20220427-081419-1 | |||||||
Kollektion: | Dissertationen | |||||||
Sprache: | Englisch | |||||||
Dokumententyp: | Wissenschaftliche Abschlussarbeiten » Dissertation | |||||||
Medientyp: | Text | |||||||
Autor: | Tietz, Tobias [Autor] | |||||||
Dateien: |
| |||||||
Beitragende: | Prof. Dr. Schwender, Holger [Betreuer/Doktorvater] Prof. Dr. Schwender, Holger [Gutachter] Prof. Dr. Ickstadt, Katja [Gutachter] | |||||||
Stichwörter: | structural MRI, spatial hierarchical clustering, ensemble clustering, voxel-based morphometry, brain parcellation, clustering stability, logic regression, variable selection, importance measure, logicFS, time-to-event data, ensemble prediction | |||||||
Dewey Dezimal-Klassifikation: | 500 Naturwissenschaften und Mathematik » 510 Mathematik | |||||||
Beschreibung: | In many clinical studies not the original variables but combinations of these variables are explanatory for the outcome of interest. Finding those combined features using statistical ensemble methods does not only improve prediction but also helps to get a better understanding of the underlying data generating processes.
Two different types of clinical data are considered in two different parts of this thesis, i.e., genotype data relating binarized genetic variations to a time-to-event in Part I and neuroimaging data consisting of structural brain scans in Part II. In Part I, the combined features are complex interactions of binarized genetic variations, as they are often the actual explanatory features for predicting, e.g., the time to recurrence of a disease. survivalFS is an existing ensemble method searching for such interactions and ranking them according to a predictive partial log-likelihood based importance measure. To improve the ranking of the identified interactions, further importance measures are proposed which are based on two other popular goodness-of-fit measures as well as on a newly introduced adaptation of Harrel's concordance index, referred to as DPO-based C-index. Moreover, noise-adjusted importance measures are introduced correcting for noise-variables falsely reducing the estimated importance of explanatory interactions. Part II builds upon the crucial and widely accepted concept that the human brain is organized into spatially contiguous, specialized brain regions, which are inter-connected by large-scale networks. Such spatially contiguous brain regions, i.e., the combined features, are identified using existing spatial hierarchical agglomerative clustering methods as well as the newly proposed SPARTACUS (SPAtial hieRarchical agglomeraTive vAriable ClUStering) method for clustering variables. Subsampling based clustering stability and clustering quality approaches are employed to identify interesting numbers of brain regions and higher-quality brain regions are searched for using ensemble clustering methods. The performance of the ensemble methods to find combined features is evaluated and compared with popular competing methods, i.e., an importance measure for bivariate variable interactions from random survival forests and spatial spectral clustering, in application to simulated and real data. These applications show that the ensemble methods are able to stably identify combined features and to outperform the competing methods. | |||||||
Lizenz: | Urheberrechtsschutz | |||||||
Fachbereich / Einrichtung: | Mathematisch- Naturwissenschaftliche Fakultät » WE Mathematik » Mathematische Optimierung | |||||||
Dokument erstellt am: | 27.04.2022 | |||||||
Dateien geändert am: | 27.04.2022 | |||||||
Promotionsantrag am: | 09.11.2021 | |||||||
Datum der Promotion: | 21.03.2022 |