Page 18 - EURECOM - RA2011GB

Basic HTML Version

Dataexplosion:
Howtocapitalize
onitsfullpotential
T
oday, datamanaged by some local networks
are no longer quantified in gigabytes or
even terabytes, but in petabytes which
means millions of billions of bytes! We are not
talking about the Internet here, but about local
networks. Google’s servers, for instance, see about
20 petabytes of data per day while the informa-
tion network of the Geneva LargeHadron Collider
manages several petabytes per second! And the
number of these networks is growing. We now
see them in national libraries, social networks,
multinationals, mobile telephone networks, or
healthcare systems. How can all these organiza-
tions best use their data? “It’s a huge challenge”,
says Pietro Michiardi, a researcher at EURECOM
who has tackled this giant task. “But that’s why
it’s so exciting. The value of these data should not
be underestimated”. According to recent studies,
the value of the American healthcare system, for
example, represents no less than $300 billion. A
better use of these data could lead to potential
applicationswhich are asmany as they are diverse.
From statistical analyses to travel customization
or an improved detection of telecommunication
faults, the list is almost endless.
The quantity of data travelling over networks has skyrocketed
these last few years, and the phenomenon is not going to stop here.
How can we use the potential of this wealth of data?
This is the purpose of the “Big Data Analytics”project.
Segmenting the problem
in smaller parts
How can we structure these raw data to extract
value-generating information?
Obviously, unlike Google, all these organiza-
tions cannot have tens of thousands of servers
running 24 hours a day and afford the energy
and space they require. The solution imagined
by PietroMichiardi and his team is to use a distrib-
uted system to transmit sets of data. The research-
EURECOM’s Cloud Platform reproduces a miniaturized version
of a legacy data center deployment, inclusive of a full-blown,
configurable network topology, worker nodes based on
plug-computers, and master nodes (not shown in the picture)
residing in a server bay and orchestrating hardware and
software components. With the Cloud Platform, Pietro and
his team benchmark and compare the performance of cloud
services, including a parallel processing service
and a distributed data store service.
EURECOM
Graduate school and research center in communication systems
18