Large-scale network simulation over heterogeneous computing architecture : Issues, opportunities and challenges

Nikaein, Navid, Ben Romdhanne, Bilel
SIMUTOOLS 2014, 7th International ICST Conference on Simulation Tools and Techniques, March 17-19, 2014, Lisbon, Portugal

The simulation is a primary step on the evaluation process of modern networked systems. The scalability and efficiency of such a tool in view of increasing complexity of the emerging networks is a key to derive valuable results. The discrete event simulation (DES) is recognized as the most scalable model that copes with both parallel and distributed architecture. In view of the potentials offered by the emerging heterogeneous computing resources, new avenues could be exploited to improve the performance of simulation. The main scope of this tutorial is to provide a new mechanisms and optimizations that could significantly improve the efficiency and scalability of parallel and distributed simulation using heterogeneous computing node architecture including multicore CPU and GPU. To address the efficiency, we present several challenges and techniques on how a parallel event should be represented and scheduled when targeting multiple CPUs and CPUs so that the hardware usage rate is maximized while the event management cost is reduced. To address scalability, we present a new simulation model called coordinator-master-worker, to address jointly the challenge of distributed and parallel simulation at different levels. The scalability of different simulation models, namely flat, master-worker, and coordinator-master-worker, under different event rate is compared under various conditions using the largest European GPU-based super-calculator, the TGCC Curie, with 1024 LPs each of which simulates up to 1 million nodes. We also present how such new techniques can be applied to the popular network simulator NS-3 to improve the efficiency and scalability of the simulation, and present a comparative results under a large scale deployment including 288 GPUs and 1152 CPU on the TGCC Curie infrastructure. 


Type:
Tutorial
City:
Lisbon
Date:
2014-03-17
Department:
Systèmes de Communication
Eurecom Ref:
4264
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in SIMUTOOLS 2014, 7th International ICST Conference on Simulation Tools and Techniques, March 17-19, 2014, Lisbon, Portugal and is available at :

PERMALINK : https://www.eurecom.fr/publication/4264