Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner

Mennatallah Amer, Markus Goldstein
In: Simon Fischer, Ingo Mierswa (eds.) Proceedings of the 3rd RapidMiner Community Meeting and Conferernce (RCOMM 2012), Pages 1-12, Budapest, Hungary, Shaker Verlag GmbH, Aachen, 8/2012

Abstract:

Unsupervised anomaly detection is the process of finding outlying records in a given dataset without prior need for training. In this paper we introduce an anomaly detection extension for RapidMiner in order to assist non-experts with applying eight different nearest-neighbor and clustering based algorithms on their data. A focus on efficient implementation and smart parallelization guarantees its practical applicability. In the context of clustering-based anomaly detection, two new algorithms are introduced: First, a global variant of the cluster-based local outlier factor (CBLOF) is introduced which tries to compensate the shortcomings of the original method. Second, the local density cluster-based outlier factor (LDCOF) is introduced which takes the local variances of clusters into account. The performance of all algorithms have been evaluated on real world datasets from the UCI machine learning repository. The results reveal the strengths and weaknesses of the single algorithms and show that our proposed clustering based algorithms outperform CBLOF significantly.

Files:

  Anomaly_Detection_Algorithms_for_RapidMiner.pdf
  slides.pdf

BibTex:

@inproceedings{ AMER2012,
	Title = {Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner},
	Author = {Mennatallah Amer and Markus Goldstein},
	Editor = {Simon Fischer, Ingo Mierswa},
	BookTitle = {Proceedings of the 3rd RapidMiner Community Meeting and Conferernce (RCOMM 2012)},
	Month = {8},
	Year = {2012},
	Publisher = {Shaker Verlag GmbH},
	Pages = {1-12}
}

     
Last modified:: 30.08.2016