Course Description
The goal of this course is to provide a comprehensive view on recent topics and trends in distributed systems and cloud computing. We will discuss the software techniques employed to construct and program reliable, highly-scalable systems. We will also cover architecture design of modern datacenters and virtualization techniques that constitute a central topic of the cloud computing paradigm The course is complemented by a number of lab sessions to get hands-on experience with Hadoop and the design of scalable algorithms with MapReduce.
Prerequisites
Knowledge of data structures, algorithm design, distributed algorithms. Being fluent with Java and at least another programming language (python is highly recommended) is highly desirable.
Comment on the lecture notes
Labels Caption:
- Label Indicates a topic addressed in one or more lectures
- Label Indicates a set of slides to be used in our or more lectures. Note that slides are highly verbose.
- Label Indicates a recommended material, including research papers, books, and on-line resources.
- Label Indicates optional readings: these are advanced research papers, or background material.
- Label Indicates Laboratory material: in this course we use [GitHub]
Lectures are heavily inspired by the following material, which is highly recommended:
- Data-intensive Text Processing with MapReduce, Morgan & Claypool, by Jimmy Lin and Chris Dyer
- This is a fantastic book, easy to read and very clear. Following Prof. Lin's work is highly recommended: [Link]
- Hadoop, The Definitive Guide, O'Reilly / Yahoo Press, by Tom White
- This is simply the "bible" for Hadoop. Use it.
- HBase, The Definitive Guide, O'Reilly, by Lars George
- Same as before!
- Cloudera Hadoop Distribution: [Link]
- CDHx is highly recommended when you don't want to spend too much time learning how to deploy Hadoop and its components. Tons of video lectures/presentations as well.
- Hadoop Project: [Link]
- The original source
NOTE: Laboratory sessions are mandatory. You are required to attend to the lab sessions, and work in a group of two.