Graduate School and Research Center in Digital Sciences

HFSP: size-based scheduling for Hadoop

Pastorelli, Mario; Barbuzzi, Antonio; Carra, Damiano; Dell'Amico, Matteo; Michiardi, Pietro

BIGDATA 2013, IEEE International Conference on BigData, October 6-9, 2013, Santa-Clara, CA, USA

Size-based scheduling with aging has, for long, been recognized as an effective approach to guarantee fairness and near-optimal system response times. We present HFSP, a scheduler introducing this technique to a real, multi-server, complex and widely used system such as Hadoop. Size-based scheduling requires a priori job size information, which is not available in Hadoop: HFSP builds such knowledge by estimating it on-line during job execution. Our experiments, which are based on realistic workloads generated via a standard benchmarking suite, pinpoint at a significant decrease in system response times with respect to the widely used Hadoop Fair scheduler, and show that HFSP is largely tolerant to job size estimation errors.

Document Doi Bibtex

Title:HFSP: size-based scheduling for Hadoop
Department:Data Science
Eurecom ref:4106
Copyright: © 2013 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Bibtex: @inproceedings{EURECOM+4106, doi = {}, year = {2013}, title = {{HFSP}: size-based scheduling for {H}adoop}, author = {{P}astorelli, {M}ario and {B}arbuzzi, {A}ntonio and {C}arra, {D}amiano and {D}ell'{A}mico, {M}atteo and {M}ichiardi, {P}ietro }, booktitle = {{BIGDATA} 2013, {IEEE} {I}nternational {C}onference on {B}ig{D}ata, {O}ctober 6-9, 2013, {S}anta-{C}lara, {CA}, {USA}}, address = {{S}anta-{C}lara, {UNITED} {STATES}}, month = {10}, url = {} }
See also: