Dokument: Analysis of Complete-Linkage, Design of Algorithms for the Separated k-Clustering and Euclidean k-MSR Problems
Titel: | Analysis of Complete-Linkage, Design of Algorithms for the Separated k-Clustering and Euclidean k-MSR Problems | |||||||
URL für Lesezeichen: | https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=66644 | |||||||
URN (NBN): | urn:nbn:de:hbz:061-20240917-111616-0 | |||||||
Kollektion: | Dissertationen | |||||||
Sprache: | Englisch | |||||||
Dokumententyp: | Wissenschaftliche Abschlussarbeiten » Dissertation | |||||||
Medientyp: | Text | |||||||
Autor: | Wargalla, Julian [Autor] | |||||||
Dateien: |
| |||||||
Beitragende: | Prof. Dr. Schmidt, Melanie [Gutachter] Prof. Dr. Röglin, Heiko [Gutachter] | |||||||
Dewey Dezimal-Klassifikation: | 000 Informatik, Informationswissenschaft, allgemeine Werke » 004 Datenverarbeitung; Informatik | |||||||
Beschreibung: | Cluster analysis is a major field within (exploratory) data analysis. Its goal is to classify objects according to degrees of proximity or (dis)similarity in such a way that the resulting clusters are relevant to whatever problem is at hand. In this dissertation, we establish new theoretical results for several clustering algorithms and the objectives that evaluate their output.
After a general introduction, in which we broadly outline the tasks of cluster analysis and introduce the main objective functions used to evaluate solutions, we turn to the Complete-Linkage algorithm in Chapter 2, a well-known algorithm for computing hierarchical clusterings. We improve the previously best-known level-by-level lower bounds for its approximation ratio on general metrics from Ω(log k), established by Dasgupta and Long, to Ω(k) and report the first non-trivial upper bounds. For CL{rad}, which greedily optimizes the k-center objective, we match the lower bound by showing that the algorithm always computes O(k)-approximations. For CL{diam}, which greedily optimizes the k-diameter objective and which is much harder to analyze, we prove an upper bound of O(k²). In Chapter 3, we analyze how well various important objectives, such as k-center, k-diameter, k-median, k-means, and k-diameter can be combined with the k-separation objective. The former ensure that objects within the same cluster are not too dissimilar, whereas the latter ensures that objects within different clusters are not too similar. Since no solutions have to exist that approximate two objectives simultaneously, we instead consider approximations of the Pareto sets of the corresponding bi-objective clustering problems. In Chapter 4, we study the Euclidean k-center PTAS for constant k established by Badoiu et al. and modify it in such a way that it yields a PTAS for the Euclidean k-MSR problem with constant k and logarithmically many outliers. Although similar results have recently been published, this is, to the best of our knowledge, the first algorithm to explicitly do this. | |||||||
Lizenz: | Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz | |||||||
Fachbereich / Einrichtung: | Mathematisch- Naturwissenschaftliche Fakultät » WE Informatik | |||||||
Dokument erstellt am: | 17.09.2024 | |||||||
Dateien geändert am: | 17.09.2024 | |||||||
Promotionsantrag am: | 19.06.2024 | |||||||
Datum der Promotion: | 23.07.2024 |