Ecole d'ingénieur et centre de recherche en Sciences du numérique

Too big to eat: Boosting analytics data ingestion from object stores with Scoop

Moatti, Yosef; Rom, Eran; Gracia-Tinedo, Raul; Naor, Dalit; Chen, Doron; Sampe, Josep; Sanchez-Artigas, Marc; Garcıa-Lopez, Pedro; Gluszak, Filip; Deschdt, Eric; Pace, Francesco; Venzano, Daniele; Michiardi, Pietro

ICDE 2017, IEEE International Conference on Data Engineering, April 19-22, 2017, San Diego, USA

Extracting value from data stored in object stores, such as OpenStack Swift and Amazon S3, can be problematic in common scenarios where analytics frameworks and object stores run in physically disaggregated clusters. One of the main problems is that analytics frameworks must ingest large amounts of data from the object store prior to the actual computation; this incurs a significant resources and performance overhead. To overcome this problem, we present Scoop. Scoop enables analytics frameworks to benefit from the computational resources of object stores to optimize the execution of analytics jobs. Scoop achieves this by enabling the addition of ETL-type actions to the data upload path and by offloading querying functions to the object store through a rich and extensible active object storage layer. As a proof-of-concept, Scoop enables Apache Spark SQL selections and projections to be executed close to the data in OpenStack Swift for accelerating analytics workloads of a smart energy grid company (GridPocket). Our experiments in a 63-machine cluster with real IoT data and SQL queries from GridPocket show that Scoop exhibits query execution times up to 30x faster than the traditional "ingest-then-compute" approach.

Document Doi Bibtex

Titre:Too big to eat: Boosting analytics data ingestion from object stores with Scoop
Ville:San Diego
Département:Data Science
Eurecom ref:5210
Copyright: © 2017 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Bibtex: @inproceedings{EURECOM+5210, doi = {}, year = {2017}, title = {{T}oo big to eat: {B}oosting analytics data ingestion from object stores with {S}coop}, author = {{M}oatti, {Y}osef and {R}om, {E}ran and {G}racia-{T}inedo, {R}aul and {N}aor, {D}alit and {C}hen, {D}oron and {S}ampe, {J}osep and {S}anchez-{A}rtigas, {M}arc and {G}arc\&\#305 and a-{L}opez, {P}edro and {G}luszak, {F}ilip and {D}eschdt, {E}ric and {P}ace, {F}rancesco and {V}enzano, {D}aniele and {M}ichiardi, {P}ietro}, booktitle = {{ICDE} 2017, {IEEE} {I}nternational {C}onference on {D}ata {E}ngineering, {A}pril 19-22, 2017, {S}an {D}iego, {USA}}, address = {{S}an {D}iego, {\'{E}}{TATS}-{UNIS}}, month = {04}, url = {} }
Voir aussi: