Cloud Computing Platform

The Data Science cloud computing platform 

The data science department, through the years, has built a powerful, on-premises computing platform. The platform is used for research and experiments, but also for running teaching laboratories for Master students.

The platform is flexible thanks to the deployment of:

  1. Analytics on demand thanks to Zoe, the open source Analytics as a Service solution developed internally
  2. OpenStack VM-based virtualization for long-running tasks

Zoe: analytics on demand

Zoe is a software born in 2015 from an initiative that started inside the department. It provides a simple way to provision data analytics clusters using container-based virtualization. The solution that was used at the time (OpenStack Sahara, to which the department contributed Apache Spark support) was insufficient and difficult to extend.

Zoe, by design, is application-independent. Users are able to create and submit new applications independently.

A smart scheduler intelligently manages the queue of requests and makes sure the cluster is fully used. Implemented as part of a PhD thesis, the Zoe scheduler, uses size estimation techniques and information about user priority to queue execution requests and assign available resources.

For example Zoe is used for:

  • Single-node and distributed TensorFlow
  • Apache Spark batch jobs
  • Interactive PySpark with Jupyter notebooks

OpenStack

The OpenStack deployment provides traditional virtual machines to the users that need a custom development environment for long periods of time. Various distributions are available and the platform supports VMs with up to 32 cores and 128GB of RAM.

Support software

The platform is monitored with Zabbix. Since the hosts running Docker produce an important volume of monitoring metrics, a fully redundant pipeline consisting of a Telegraf, Kafka, KairosDB and Cassandra has been put in place to gather and save metrics.

A 15TB CEPH cluster is also available for distributed, redundant storage of datasets and experimental results.

Datasets

We use the platform itself to generate a number of useful datasets. Access and workload logs for Zoe are used in scheduling research, as well as the monitoring (CPU, memory, energy consumption data) gathered from the physical hosts.

The Hardware

The server racks are hosted in Eurecom’s datacenter, that provides backup electrical power, air cooling and top-of-rack 10GBps fiber uplinks.

The platform, currently, is composed of 25 servers:

  1. Two Dell PowerEdge 2950
    1. 2x Intel Xeon CPU L5320  @ 1.86GHz (8 cores total with HT)
    2. 16GB of RAM
    3. 3TB of space on 5 disks
  2. Six Dell PowerEdge R620
    1. 2x Intel Xeon CPU E5-2650L 0 @ 1.80GHz (32 cores total with HT)
    2. 128GB of RAM
    3. 10TB of space on 10 disks
  3. Sixteen Dell PowerEdge R630
    1. 2x Intel Xeon CPU E5-2630 v3 @ 2.40GHz (32 cores total with HT)
    2. 128GB of RAM
    3. 10TB of space on 10 disks
  4. One Dell PowerEdge C4130
    1. 2x Intel Xeon CPU E5-2683 v4 @ 2.10GHz (64 cores total with HT)
    2. 256GB of RAM
    3. 1.6TB of disk space on 2 SSDs
    4. 4 NVidia Tesla P100 GPUs

 

Syndicate

Syndicate content

Data Science