On the impact of virtualization on the I/O performance of analytic workloads

Ha, Son-Hai; Venzano, Daniele; Brown, Patrick; Michiardi, Pietro
CLOUDTECH 2016, 2nd International Conference on Cloud Computing Technologies and Applications, May 24-26, 2016, Marrakesh, Morocco

In this work we study the I/O performance of long, sequential workloads that mimic those of Big Data applications, to understand the implications of system virtualization on data-intensive frameworks such as Apache Hadoop and Spark, which are frequently run in clusters of Virtual Machines (VMs). We do so through an experimental measurement campaign that collects low-level traces and metrics, to show the role played by important parameters such as the I/O schedulers and caching mechanisms involved in the I/O path, and the VM configuration in terms of dedicated resources. Our findings are important, especially for determining appropriate deployment strategies for today's emerging Analytics Services hosted both on public and private clouds.


DOI
HAL
Type:
Conférence
City:
Marrakesh
Date:
2016-05-24
Department:
Data Science
Eurecom Ref:
5139
Copyright:
© 2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PERMALINK : https://www.eurecom.fr/publication/5139