Practical size-based scheduling for MapReduce workloads

Pastorelli, Mario; Barbuzzi, Antonio; Carra, Damiano; Dell'Amico, Matteo; Michiardi, Pietro
arXiv:1302.2749, May 3rd, 2013

We present the Hadoop Fair Sojourn Protocol (HFSP) scheduler, which implements a size-based scheduling discipline for Hadoop. The benefits of size-based scheduling disciplines are well recognized in a variety of contexts (computer networks, operating systems, etc...), yet, their practical implementation for a system such as Hadoop raises a number of important challenges. With HFSP, which is available as an open-source project, we address issues related to job size estimation, resource management and study the effects of a variety of preemption strategies. Although the architecture underlying HFSP is suitable for any size-based scheduling discipline, in this work we revisit and extend the Fair Sojourn Protocol, which solves problems related to job starvation that affect FIFO, Processor Sharing and a range of size-based disciplines. Our experiments, in which we compare HFSP to standard Hadoop schedulers, pinpoint at a significant decrease in average job sojourn times - a metric that accounts for the total time a job spends in the system, including waiting and serving times - for realistic workloads that we generate according to production traces available in literature.

Data Science
Eurecom Ref:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in arXiv:1302.2749, May 3rd, 2013 and is available at :