Dokument: Analysis of Complete-Linkage, Design of Algorithms for the Separated k-Clustering and Euclidean k-MSR Problems

Titel:Analysis of Complete-Linkage, Design of Algorithms for the Separated k-Clustering and Euclidean k-MSR Problems
URL für Lesezeichen:https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=66644
URN (NBN):urn:nbn:de:hbz:061-20240917-111616-0
Kollektion:Dissertationen
Sprache:Englisch
Dokumententyp:Wissenschaftliche Abschlussarbeiten » Dissertation
Medientyp:Text
Autor: Wargalla, Julian [Autor]
Dateien:
[Dateien anzeigen]Adobe PDF
[Details]1,36 MB in einer Datei
[ZIP-Datei erzeugen]
Dateien vom 27.08.2024 / geändert 27.08.2024
Beitragende:Prof. Dr. Schmidt, Melanie [Gutachter]
Prof. Dr. Röglin, Heiko [Gutachter]
Dewey Dezimal-Klassifikation:000 Informatik, Informationswissenschaft, allgemeine Werke » 004 Datenverarbeitung; Informatik
Beschreibung:Cluster analysis is a major field within (exploratory) data analysis. Its goal is to classify objects according to degrees of proximity or (dis)similarity in such a way that the resulting clusters are relevant to whatever problem is at hand. In this dissertation, we establish new theoretical results for several clustering algorithms and the objectives that evaluate their output.

After a general introduction, in which we broadly outline the tasks of cluster analysis and introduce the main objective functions used to evaluate solutions, we turn to the Complete-Linkage algorithm in Chapter 2, a well-known algorithm for computing hierarchical clusterings. We improve the previously best-known level-by-level lower bounds for its approximation ratio on general metrics from Ω(log k), established by Dasgupta and Long, to Ω(k) and report the first non-trivial upper bounds. For CL{rad}, which greedily optimizes the k-center objective, we match the lower bound by showing that the algorithm always computes O(k)-approximations. For CL{diam}, which greedily optimizes the k-diameter objective and which is much harder to analyze, we prove an upper bound of O(k²).

In Chapter 3, we analyze how well various important objectives, such as k-center, k-diameter, k-median, k-means, and k-diameter can be combined with the k-separation objective. The former ensure that objects within the same cluster are not too dissimilar, whereas the latter ensures that objects within different clusters are not too similar. Since no solutions have to exist that approximate two objectives simultaneously, we instead consider approximations of the Pareto sets of the corresponding bi-objective clustering problems.

In Chapter 4, we study the Euclidean k-center PTAS for constant k established by Badoiu et al. and modify it in such a way that it yields a PTAS for the Euclidean k-MSR problem with constant k and logarithmically many outliers. Although similar results have recently been published, this is, to the best of our knowledge, the first algorithm to explicitly do this.
Lizenz:Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung 4.0 International Lizenz
Fachbereich / Einrichtung:Mathematisch- Naturwissenschaftliche Fakultät » WE Informatik
Dokument erstellt am:17.09.2024
Dateien geändert am:17.09.2024
Promotionsantrag am:19.06.2024
Datum der Promotion:23.07.2024
english
Benutzer
Status: Gast
Aktionen