ADMS 2021, 12th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, in conjunction with VLDB 2021, 16 August 2021, Copenhagen, Denmark
Synthetic DNA has received much attention recently as an archival storage media due to its high density and durability characteristics. However, the process of retrieving data from DNA is computationally bottlenecked by a key read consensus stage that effectively
performs an edit similarity join to identify millions of unique consensus strings from hundreds of millions of noisy copies. In this work, we present an end-to-end DNA data decoding pipeline based on OneJoin–a cross-architecture edit similarity join that can exploit multicore CPUs, integrated GPUs, and multi-vendor discrete GPUs using a single code base. Central to the effectiveness of OneJoin is the use of oneAPI–an open, standards-based unified programming model for achieving portable data parallelism. Based on a rigorous experimental evaluation using macrobenchmarks and real-world
data from DNA storage experiments, we show that OneJoin can provide up to 21× improvement in performance over other state-of-the-art joins and reduce the overall DNA data decoding time from several hours to just a few minutes.
© ACM, 2021. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ADMS 2021, 12th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, in conjunction with VLDB 2021, 16 August 2021, Copenhagen, Denmark