We present NG-DBSCAN, an approximate density-based clustering algorithm that operates on arbitrary data and any symmetric distance measure. The distributed design of our algorithm makes it scalable to very large datasets; its approximate nature makes it fast, yet capable of producing high quality clustering results. We provide a detailed overview of the steps of NG-DBSCAN, together with their analysis. Our results, obtained through an extensive experimental campaign with real and synthetic data, substantiate our claims about NG-DBSCAN's performance and scalability.
NG-DBSCAN: Scalable density-based clustering for arbitrary data
VLDB 2016, 42nd International Conference on Very Large Data Bases, September 5-9, 2016, New-Delhi, India / Proceedings of the VLDB Endowment, 2016, Vol.10, N°3
PERMALINK : https://www.eurecom.fr/publication/5076