Graduate School and Research Center in Digital Sciences

Stocator: Providing high performance and fault tolerance for Apache Spark over object storage

Vernik, Gil; Factor, Michael; Kolodner, Elliot K.; Ofer, Effi; Michiardi, Pietro; Pace, Francesco

CCGRID 2018, 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 1-4, 2018, Washington DC, USA

Until now object storage has not been a firstclass citizen of the Apache Hadoop ecosystem including Apache Spark. Hadoop connectors to object storage have been based on file semantics, an impedance mismatch, which leads to low performance and the need for an additional consistent storage system to achieve fault tolerance. In particular, Hadoop depends on its underlying storage system and its associated connector for fault tolerance and allowing speculative execution. However, these characteristics are obtained through file operations that are not native for object storage, and are both costly and not atomic. As a result these connectors are not efficient and more importantly they cannot help with fault tolerance for object storage. We introduce Stocator, whose novel algorithm achieves both high performance and fault tolerance by taking advantage of object storage semantics. This greatly decreases the number of operations on object storage as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object storage. We have implemented Stocator and shared it in open source. Performance testing with Apache Spark shows that it can be 18 times faster for write intensive workloads and can perform 30 times fewer operations on object storage than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider.

Document Doi Bibtex

Title:Stocator: Providing high performance and fault tolerance for Apache Spark over object storage
Type:Conference
Language:English
City:Washington
Country:UNITED STATES
Date:
Department:Data Science
Eurecom ref:5554
Copyright: © ACM, 2018. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in CCGRID 2018, 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 1-4, 2018, Washington DC, USA http://dx.doi.org/10.1109/CCGRID.2018.00073
Bibtex: @inproceedings{EURECOM+5554, doi = {http://dx.doi.org/10.1109/CCGRID.2018.00073}, year = {2018}, title = {{S}tocator: {P}roviding high performance and fault tolerance for {A}pache {S}park over object storage}, author = {{V}ernik, {G}il and {F}actor, {M}ichael and {K}olodner, {E}lliot {K}. and {O}fer, {E}ffi and {M}ichiardi, {P}ietro and {P}ace, {F}rancesco}, booktitle = {{CCGRID} 2018, 18th {IEEE}/{ACM} {I}nternational {S}ymposium on {C}luster, {C}loud and {G}rid {C}omputing, {M}ay 1-4, 2018, {W}ashington {DC}, {USA} }, address = {{W}ashington, {UNITED} {STATES}}, month = {05}, url = {http://www.eurecom.fr/publication/5554} }
See also: