OneOligo: Use oneAPI to accelerate DNA data storage

Appuswamy, Raja; Marinelli, Eugenio
Invited talk

In the EurIn the European Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In doing so, we first provide an overview the DNA data storage pipeline. Then, we present OneJoin—a string-similarity join algorithm that synergistically combines algorithmic advances in low-distortion embedding with cross-architectural programming ability offered by DPC++, to scale-up clustering across CPUs and GPUs.

opIn the European Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In doing so, we first provide an overview the DNA data storage pipeline. Then, we present OneJoin—a string-similarity join algorithm that synergistically combines algorithmic advances in low-distortion embedding with cross-architectural programming ability offered by DPC++, to scale-up clustering across CPUs and GPUs.

ean Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In dIn the European Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In doing so, we first provide an overview the DNA data storage pipeline. Then, we prIn the European Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In doing so, we first provide an overview the DNA data storage pipeline. Then, we present OneJoin—a string-similarity join algorithm that synergistically combines algorithmic advances in low-distortion embedding with cross-architectural programming ability offered by DPC++, to scale-up clustering across CPUs and GPUs.

esent OneJoin—a string-similarity join algorithm that synergistically combines algorithmic advances in low-distortion embedding with cross-architectural programming ability offered by DPC++, to scale-up clustering across CPUs and GPUs.


Type:
Talk
Date:
2020-11-03
Department:
Data Science
Eurecom Ref:
6468
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Invited talk and is available at :

PERMALINK : https://www.eurecom.fr/publication/6468