Resource Allocation and Management

This line of work is inscribed in our long-term quest to understand how to achieve efficiency, fairness and ultimately, high resource utilization in large-scale computing clusters. 

Size-based scheduling

Our work focuses on a general approach to understand size-based scheduling and its properties, as a general policy for Big Data computing frameworks. Indeed, size-based schedulers have very desirable performance properties: optimal or near-optimal system response times can be coupled with strong fairness. Despite this, however, such policies are rarely implemented in practical settings, because they require knowing a priori the amount of work needed to complete analytical jobs: this assumption is difficult to satisfy in concrete systems. It is definitely more likely to inform the system with an estimate of the job sizes, but existing studies point to somewhat pessimistic results if size-based policies use imprecise job size estimations.

We take the goal of designing scheduling policies that explicitly deal with inexact job sizes, and came up with a variety of approaches to do so, both in theory and in practice. Our research has been applied to Hadoop MapReduce, and currently is also at the heart of a new system that we built to offer analytic services to data scientists, which we develop next.

Container-based Analytics-as-a-Service

This line of work represents a major research and development effort, toward the design, implementation and operation of a completely new cluster scheduler. The endeavor of this line of work is to fill the gap that exists in current approaches (Mesos, Kubernetes, and the like), and raise the level of abstraction at which scheduling works. We introduce a general and flexible definition of analytic applications, how they are composed, and how to execute them. For example, a user application addressing the training of a statistical model involves: a user-defined program implementing a learning algorithm, a framework (e.g., TensorFlow) to execute such a program together with information about its resource requirements, the location for input and output data and possibly hyper-parameters exposed as application arguments. Users should be able to express, in a simple way, how such an application must be packaged and executed, submit it, and expect results as soon as possible.
Our research shows that scheduling such applications represents a departure from what has been studied in the scheduling literature, and present the design of a new algorithm to address the problem. 

For a sneak preview of our latest open-source software developments in this area, visit http://zoe-analytics.eu/

This work is actively supported by our collaboration with KPMG Germany, and receives software contributions from AirFrance/KLM.

Scheduling and pricing of cloud resources

Cloud computing has now become the main paradigm for computing, storage and many other applications. The design of appropriate pricing schemes for cloud resources is therefore of great importance for cloud providers to maximize their revenue. Given the capacity of a cloud, an important objective is to increase the resource utilization so as to allow more tenants to be served, increasing its revenue. In other words, the pricing problem cannot be separated from the question of tasks scheduling in the cloud. Our work addresses questions in both scheduling and pricing in cloud computing.

In this context, we provide a quantitative characterization of a sufficient and necessary condition such that a set of malleable batch tasks with deadlines can be scheduled on multiple cloud machines, and a polynomial-time algorithm to produce such a feasible schedule.

In addition, cloud providers currently mostly use a fixed usage-based pricing, sometimes in association with a spot market to sell remaining resources and with reserved instances with long-term commitment. Given the current pricing models, tenants face difficulties to decide on which type of instance to purchase. Another contribution of our work is to extend the existing online learning algorithm of how to make decision to purchase on-demand and spot instances to the case with instances requiring long-term commitment.

 

 

Selected publications: 

  • X. Wu and P. Loiseau. Algorithms for scheduling deadline-sensitive malleable tasks. In Proceedings of the 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2015. [ bib | pdf | http ]

 

Syndicate

Syndicate content

Data Science