PaMPa-HD: a parallel MapReduce-based frequent pattern miner for high-dimensional data

Apiletti, Daniele; Baralis, Elena; Cerquitelli, Tania; Garza, Paolo; Pulvirenti, Fabio
HDM 2015, 3rd International Workshop on High Dimensional Data Mining, In conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2015), 14 November 2015, Atlantic City, NJ, USA

Frequent closed itemset mining is among the most complex exploratory techniques in data mining, and provides the ability to discover hidden correlations in transactional datasets.
The explosion of Big Data is leading to new parallel and distributed approaches. Unfortunately, most of them are designed to cope with low-dimensional datasets, whereas no distributed highdimensional frequent closed itemset mining algorithms exists. This work introduces PaMPa-HD, a parallel MapReduce-based frequent closed itemset mining algorithm for high-dimensional datasets, based on Carpenter. The experimental results, performed on both real and synthetic datasets, show the efficiency and scalability of PaMPa-HD.

Atlantic City
Data Science
Eurecom Ref:
© 2015 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.