Ecole d'ingénieur et centre de recherche en Sciences du numérique

RHEEM: Enabling cross-platform data processing - May the Big data be with you!

Agrawal, Divy; Chawla, Sanjay; Contreras-Rojas, Bertty; Elmagarmid, Ahmed; Idris, Yasser; Kaoudi, Zoi; Kruse, Sebastian; Lucas, Ji; Mansour, Essam; Ouzzani, Mourad; Papotti, Paolo; Quiane-Ruiz, Jorge-Arnulfo; Tang, Nan; Thirumuruganathan, Saravanan; Troudi, Anis

VLDB 2018, 44th International Conference on Very Large Data Bases, 27-31 August 2018, Rio de Janeiro, Brazil / Proceedings of the VLDB Endowment, Vol.11, N°12, August 2018

Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present Rheem, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) an interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with Rheem, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.

Document Bibtex

Titre:RHEEM: Enabling cross-platform data processing - May the Big data be with you!
Ville:Rio de Janeiro
Département:Data Science
Eurecom ref:5642
Copyright: VLDB
Bibtex: @inproceedings{EURECOM+5642, year = {2018}, title = {{RHEEM}: {E}nabling cross-platform data processing - {M}ay the {B}ig data be with you!}, author = {{A}grawal, {D}ivy and {C}hawla, {S}anjay and {C}ontreras-{R}ojas, {B}ertty and {E}lmagarmid, {A}hmed and {I}dris, {Y}asser and {K}aoudi, {Z}oi and {K}ruse, {S}ebastian and {L}ucas, {J}i and {M}ansour, {E}ssam and {O}uzzani, {M}ourad and {P}apotti, {P}aolo and {Q}uiane-{R}uiz, {J}orge-{A}rnulfo and {T}ang, {N}an and {T}hirumuruganathan, {S}aravanan and {T}roudi, {A}nis}, booktitle = {{VLDB} 2018, 44th {I}nternational {C}onference on {V}ery {L}arge {D}ata {B}ases, 27-31 {A}ugust 2018, {R}io de {J}aneiro, {B}razil / {P}roceedings of the {VLDB} {E}ndowment, {V}ol.11, {N}°12, {A}ugust 2018 }, address = {{R}io de {J}aneiro, {BR}{\'{E}}{SIL}}, month = {08}, url = {} }
Voir aussi: