Graduate School and Research Center in Digital Sciences

Duy Hung PHAN

Duy Hung PHAN
Duy Hung PHAN
Eurecom - Data Science 
Doctoral student ( 2013 - 2016)



My current work is doing research in the field of Data-Intensive Scalable Computing:

  • High-level language Optimization for Data-Intensive Scalable Computing (Apache Pig and Hive): the aim is to provide both manual and automatically optimized algorithms to data-processing operators. This allows data scientists to focus only on answering the high-level question: "what do these data mean", rather than paying attention to the techniques details of implementing their algorithms per se.
  • The opportunity of work sharing in common work flows: the simple example would be if 2 different work flows start reading from the same data-set, then we can share the reading work block and avoid the I/O cost of reading the whole data twice. With the evolution Big Data, a lot of frequent workloads in enterprises are characterized and specified using Apache Oozie, a workflow scheduler system to manage Apache Hadoop jobs. Looking into these workflows and realizing the benefits of sharing & scheduling the common work blocks helps to speed up the productivity.