Dokument: Detection of Anomalous Sequences in Multivariate Time Series Data

Titel:

Detection of Anomalous Sequences in Multivariate Time Series Data

URL für Lesezeichen:

https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=58199

URN (NBN):

urn:nbn:de:hbz:061-20211207-081240-4

Kollektion:

Dissertationen

Sprache:

Englisch

Dokumententyp:

Wissenschaftliche Abschlussarbeiten » Dissertation

Medientyp:

Text

Autor:

Krakowski, Martha Natalia [Autor]

Dateien:

[Dateien anzeigen]	Adobe PDF
[Details]	24,74 MB in einer Datei
[ZIP-Datei erzeugen]
Dateien vom 29.11.2021 / geändert 29.11.2021

Beitragende:

Prof. Dr. Conrad, Stefan [Gutachter]
Prof. Dr. Kröger, Peer [Gutachter]

Dewey Dezimal-Klassifikation:

000 Informatik, Informationswissenschaft, allgemeine Werke » 004 Datenverarbeitung; Informatik

Beschreibung:

Due to the increasing amount of data collected in various domains, the field of data mining focusing on the automated information extraction from data, becomes increasingly important. The inclusion of temporal information can be very helpful and lead to a deeper insight of the data in many applications. Some fields of application include the analysis of sequentially recorded data regarding the disease progression of patients, user behavior in online shops or stock market data. Time series analysis deals with such sequentially recorded data, called time series, and covers a large field of data mining methods such as classification, clustering and outlier detection. In this thesis, we concentrate on outlier detection in time series data sets. Apart from the identification of errors and malfunctions, the recognition of outliers can help finding anomalies with other semantic values. For example, credit card fraud, conspicuous user behavior or seldom diseases might be discovered.

In contrast to other approaches, which consider a single time series or the whole data set at once, we identify groups of time series in order to focus on a more informative scope of courses. We believe, that usual group behavior can be extracted, leading to deeper insights into normal and anomalous developments of time series. Therefore we cluster the data per timestamp and investigate the sequences' transitions between clusters over time. Our approach is applicable to all applications, where a formation of groups of time series following a similar trend can be assumed. One example is the examination of annual financial reports of publicly listed companies. Companies sharing a similar industry and corporate strategy, will most probably exhibit a similar development of their balance sheet figures. If one company suddenly splits from its former group, it shows a conspicuous behavior which might indicate advantages or disadvantages caused by different circumstances including fraud. This behavior can be detected based on the company's cluster transitions.

Since this approach is dependent on an underlying clustering, we do not only focus on outlier detection but also on the clustering of time series and appropriate evaluation measures. In this thesis, we introduce a new type of outliers based on group-behavior and two novel approaches for their identification. Moreover, we introduce the term "over-time stability" describing the stability of clusters' member compositions over time. We propose a novel clustering approach producing clusterings per timestamp under the consideration of temporal information maximizing the clusters' over-time stability. Furthermore, we present two validity measures evaluating the over-time stability of crisp and fuzzy clusterings. Those measurements enable the evaluation and quantitative comparison of different clusterings per timestamp for the first time. Therefore, they represent helpful tools for discovering optimal parameter settings and best fitting algorithms for applications.

Our experiments on various artificial and real-world data sets show the functionality and applicability of our approaches. All intended aims have been achieved. Several executed analyses of the data demonstrate the variety of our evaluation measure for crisp environments and highlight the potential for further extensions. The outlier detection algorithm could be quantitatively evaluated regarding the detection of financial restatements. The achieved results are competitive against other state-of-the-art algorithms in the field of economics, and demonstrate a meaningful field of application. One important advantage of the approach, that can not be underrated, is the transparency of decisions, which increases the willingness of usage in real-world environments.

Lizenz:

Urheberrechtsschutz

Fachbereich / Einrichtung:

Mathematisch- Naturwissenschaftliche Fakultät » WE Informatik » Datenbanken und Informationssysteme

Dokument erstellt am:

07.12.2021

Dateien geändert am:

07.12.2021

Promotionsantrag am:

27.04.2021

Datum der Promotion:

05.11.2021

Heinrich-Heine-Universität Düsseldorf

Dokument: Detection of Anomalous Sequences in Multivariate Time Series Data