Outlier detection with the kernelized spatial depth function

Statistical depth functions provide from the "deepest" point a "center-outward ordering" of multidimensional data. In this sense, depth functions can measure the "extremeness" or "outlyingness" of a data point with respect to a given data set. Hence, they can...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 31(2009), 2 vom: 13. Feb., Seite 288-305
1. Verfasser: Chen, Yixin (VerfasserIn)
Weitere Verfasser: Dang, Xin, Peng, Hanxiang, Bart, Henry L Jr
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2009
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.
Beschreibung
Zusammenfassung:Statistical depth functions provide from the "deepest" point a "center-outward ordering" of multidimensional data. In this sense, depth functions can measure the "extremeness" or "outlyingness" of a data point with respect to a given data set. Hence, they can detect outliers--observations that appear extreme relative to the rest of the observations. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. In this article, we propose a novel statistical depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. We demonstrate this by the half-moon data and the ring-shaped data. Based on the KSD, we propose a novel outlier detection algorithm, by which an observation with a depth value less than a threshold is declared as an outlier. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. It applies to a one-class learning setting, in which "normal" observations are given as the training data, as well as to a missing label scenario, where the training set consists of a mixture of normal observations and outliers with unknown labels. We give upper bounds on the false alarm probability of a depth-based detector. These upper bounds can be used to determine the threshold. We perform extensive experiments on synthetic data and data sets from real applications. The proposed outlier detector is compared with existing methods. The KSD outlier detector demonstrates a competitive performance
Beschreibung:Date Completed 17.03.2009
Date Revised 26.12.2008
published: Print
Citation Status MEDLINE
ISSN:1939-3539
DOI:10.1109/TPAMI.2008.72