Ecole d'ingénieur et centre de recherche en télécommunications

Parallel bulk Insertion for large-scale analytics applications

Barbuzzi, Antonio; Michiardi, Pietro; Biersack, Ernst W; Boggia, Gennaro

LADIS 2010, 4th ACM SIGOPS/SIGACT Workshop on Large Scale Distributed Systems and Middleware, July 28-29, 2010, Zürich, Switzerland

Modern data analytics applications, e.g. Internet-scale indexing, system trace analysis, recommender engines to name a few, operate on massive amounts of data and call for a parallel approach to data processing. In this work, we focus on the popular MapReduce framework to carry out such tasks and identify bulk data insert operations as a critical preliminary step to achieve reduced processing times, especially when new data is generated and processed at regular time intervals.                               We present a parallel approach to bulk data insertion in a system that use horizontally range partitioned data and evaluate several variants to insertion operations, including legacy approaches. Our method exploits the parallel processing framework itself to insert data into the system, which is stored in a semi-structured format. Our results indicate that a parallel approach to bulk insertion can substantially reduce the recurrent costs of insertion of new data into the system                

Document Doi Bibtex

Type:Conférence
Langue:English
Ville:Zürich
Pays:SUISSE
Date:
Département:Réseaux et Sécurité
Eurecom ref:3177
Copyright: © ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in LADIS 2010, 4th ACM SIGOPS/SIGACT Workshop on Large Scale Distributed Systems and Middleware, July 28-29, 2010, Zürich, Switzerland http://dx.doi.org/10.1145/1859184.1859192
Bibtex: @inproceedings{EURECOM+3177, doi = {http://dx.doi.org/10.1145/1859184.1859192 }, year = {2010}, title = {{P}arallel bulk {I}nsertion for large-scale analytics applications}, author = {{B}arbuzzi, {A}ntonio and {M}ichiardi, {P}ietro and {B}iersack, {E}rnst {W} and {B}oggia, {G}ennaro}, booktitle = {{LADIS} 2010, 4th {ACM} {SIGOPS}/{SIGACT} {W}orkshop on {L}arge {S}cale {D}istributed {S}ystems and {M}iddleware, {J}uly 28-29, 2010, {Z}{\"u}rich, {S}witzerland}, address = {{Z}{\"u}rich, {SUISSE}}, month = {07}, url = {http://www.eurecom.fr/publication/3177} }
Voir aussi: