Distributed Systems Group

The Distributed System Group blends theory and system research on large-scale distributed systems (including data processing and data storage), distributed algorithms, and parallel algorithms to mine massive amounts of data. Below, a detailed description of the main themes currently addressed in the group.

Scalable Algortihm Design

The objectives of this research activity revolve around the design, analysis and performance evaluation of data mining algorithms, with a particualr emphasis on machine learning.
As the focus is on mining massive amounts of data, this research activity involves the implementation of scalable algorithms for their execution on parallel processing systems, such as MapReduce / Hadoop.

View details »

Parallel Processing Systems

The main focus of this theme lies in the design, implementation and experimental validation of parallel processing systems, and their integration with distributed data stores (including distributed filesystems, and distributed databases).
In particular, the group currently works and contributes to the Apache Hadoop open-source project, focusing on the Hadoop implementation of MapReduce.

View details »

Distributed Data Stores

This research line includes activities in distributed filesystems, distributed database systems, and key/value data stores. The group currently works and contributes to the Apache Hadoop open-source project, in particular focussing on the Hadoop Distributed File System (HDFS) and HBase, which is the open-source implementation of BigTable.

View details »

Experimental Platforms

This research line includes the design, development and deployment of experimental test-beds for the analysis and performance evaluation of scalable algorithms and systems for data processing.

View details »