Traditionally, analytical database engines have used task parallelism provided by modern multisocket multicore CPUs for scaling query execution. Over the past few years, GPUs have started gaining traction as accelerators for processing analytical queries due to their massively data-parallel nature and high memory bandwidth. Recent work on designing join algorithms for CPUs has shown that carefully tuned join implementations that exploit underlying hardware can outperform naive, hardwareoblivious counterparts and provide excellent performance on modern multicore servers. However, there has been no such systematic analysis of hardware-conscious join algorithms for GPUs that systematically explores the dimensions of partitioning (partitioned versus non-partitioned joins), data location (data fitting and not fitting in GPU device memory), and access pattern (skewed versus uniform). In this paper, we present the design and implementation of a family of novel, partitioning-based GPU-join algorithms that are tuned to exploit various GPU hardware characteristics for working around the two main limitations of GPUs-limited memory capacity and slow PCIe interface. Using a thorough evaluation, we show that: i) hardware-consciousness plays a key role in GPU joins similar to CPU joins and our join algorithms can process 1 Billion tuples/second even if no data is GPU resident, ii) radix partitioning-based GPU joins that are tuned to exploit GPU hardware can substantially outperform non-partitioned hash joins, iii) hardware-conscious GPU joins can effectively overcome GPU limitations and match, or even outperform, state-of-the-art CPU joins.
Hardware-conscious hash-joins on GPUs
ICDE 2019, 35th IEEE International Conference on Data Engineering, 8-12 April 2019, Macau SAR, China
© 2019 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/5780