Distributed Systems and Cloud Computing

Prof. Pietro Michiardi

Prof. Marko Vukolic

Course Description

The goal of this course is to provide a comprehensive view on recent topics and trends in distributed systems and cloud computing. We will discuss the software techniques employed to construct and program reliable, highly-scalable systems. We will also cover architecture design of modern datacenters and virtualization techniques that constitute a central topic of the cloud computing paradigm The course is complemented by a number of lab sessions to get hands-on experience with Hadoop and the design of scalable algorithms with MapReduce.

Prerequisites

Knowledge of data structures, algorithm design, distributed algorithms. Being fluent with Java and at least another programming language (python is highly recommended) is highly desirable.

Comment on the lecture notes

Labels Caption:

Lectures are heavily inspired by the following material, which is highly recommended:

Data-intensive Text Processing with MapReduce, Morgan & Claypool, by Jimmy Lin and Chris Dyer
This is a fantastic book, easy to read and very clear. Following Prof. Lin's work is highly recommended: [Link]
Hadoop, The Definitive Guide, O'Reilly / Yahoo Press, by Tom White
This is simply the "bible" for Hadoop. Use it.
HBase, The Definitive Guide, O'Reilly, by Lars George
Same as before!
Cloudera Hadoop Distribution: [Link]
CDHx is highly recommended when you don't want to spend too much time learning how to deploy Hadoop and its components. Tons of video lectures/presentations as well.
Hadoop Project: [Link]
The original source
NOTE: Laboratory sessions are mandatory. You are required to attend to the lab sessions, and work in a group of two.

Lecture Notes

Topic: (Hadoop) MapReduce, HDFS
  • Lecture Notes [Theory and Practice of MapReduce]
  • Article Jeffrey Dean and Sanjay Ghemawat, Mapreduce: Simplified data processing on large clusters, In Proc. of ACM OSDI, 2004
  • Article Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The google file system, In Proc. of ACM OSDI, 2003
Topic: Relational Algebra and MapReduce, Hadoop Pig
  • Lecture Notes [Relational Algebra and Hadoop Pig]
  • Article C. Olston, et al., Pig latin: a not-so-foreign language for data processing, In Proc. of ACM SIGMOD, 2008
  • Article A. Gates, et al., Building a High-Level Dataflow System on top of MapReduce: The Pig Experience, In Proc. of ACM VLDB, 2009
  • Book A. Gates, Programming Pig, Dataflow Scripting with Hadoop, O'Reilly
  • Book Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, Cambridge University Press
  • On-line Jennifer Widom, Introduction to Databases, [Link]
Topic BigTable and HBase
  • Lecture Notes [Theory and Practice of HBase]
  • Article F. Chang, et al., Bigtable: A distributed storage system for structured data, In Proc. od USENIX OSDI, 2006
  • Article The Log-Structured Merge-Tree (LSM-Tree), by P. O'Neil et al
Topic Selected Topics in Cloud Computing
Topic Distributed Storage Systems
Topic Coordinating distributed systems
Laboratory Instructions for the Laboratory Sessions