Page 19 - EURECOM - RA2011GB

Basic HTML Version

ers have developed an experimental cloud-based
platform with about 100 servers, each with 512
megabytes of memory and 20 gigabytes of storage
space. When an online store owner decides to
recommend his articles to his clients, for example,
he must use a huge matrix filled with data which
connects each customer to each product bought
in the past. “With the algorithms we’ve developed,
we can reduce the size of the matrix so that each
server only deals with part of the problem”, says
Pietro Michiardi.
Parallel computing
Next, the various calculations must be inte-
grated to reach a concrete solution. This is where
parallelization techniques come into play to
process issues simultaneously. Thanks to an
Open Source software to which EURECOM has
been contributing, the researchers can run their
algorithms at the same time on all one hundred
servers to optimize computation. Being able to
control all the servers is also helping researchers
measure the performance of their algorithms, and
consequently improving them. Pietro Michiardi
explains: “We areworkingwith industrial partners
on several ongoing projects: telecommunication
operators for whom we are developing tools for
service customization and transmission fault
detection, but also Internet service providers or
electricity suppliers”. The goal is to demonstrate
the reliability of EURECOM’s solutions and the
advantages for companies to implement them.
In the meantime, the Big Data Analytics project
has clearly opened up a window, showing that it
was possible to use this explosion of data to gain
a competitive advantage. And this will be even
more true in the near future.
Publications
Michiardi, Pietro; Barbuzzi, Antonio; Carra,
Damiano Shared Cluster
Scheduling: a Fair and Efficient
Protocol EURECOM - Research Report
11-259, 2011
Baer, Arian; Barbuzzi, Antonio; Michiardi, Pietro;
Ricciato, Fabio Two
Parallel Approaches to Network Data
Analysis In Proc. of ACM SIGOPS LA-
DIS2011, in conjunction with VLDB2011
Barbuzzi, Antonio; Michiardi, Pietro; Biersack,
Ernst W; Boggia, Gennaro
Parallel bulk Insertion for large-scale
analytics applications In Proc. of
ACM SIGOPS/SIGACT LADIS 2010, in
conjunction with PODC2010
Pietro
Michiardi
Seealso
www.eurecom.fr/fr/people/michiardi-pietro
Contact
Pietro.Michiardi@eurecom.fr
WEB
www.eurecom.fr/fr/people/michiardi-pietro
NATIONALITy
Italian
Assistant Professor
at the Networking
and Security Dept.
ApacheHadoop isa software
frameworkthat supportsdata-
intensivedistributedapplications
under a free license] It enables
applications toworkwith thousandsof
computational independent computers
andpetabytesof data. Hadoopwas
derived fromGoogle’sMapReduce
andGoogle FileSystem(GFS) papers.
Hadoop isa top-level Apacheproject
beingbuilt andusedbyaglobal
communityof contributors, written in
the Javaprogramming language
Developer(s):ApacheSoftwareFoundation
Written inJava
Operatingsystem:Cross-platform
Type:DistributedFileSystem
License:ApacheLicense2.0
Website:hadoop.apache.org
networking and security
data
example
Twitter
time
SQL
new
big-data
analysis
processing
information
compression
databases
mobile
database
Hadoop
analyses
NoSQL
column-store
terabytes
Data
storage
support
big
query
now
tools
store
2011annual report 19