OligoArchive-DSM: Columnar design for error-tolerant database archival using synthetic DNA

Marinelli, Eugenio; Yan, Yiqing; Magnone, Virginie; Dumargne, Marie-Charlotte; Barbry, Pascal; Heinis, Thomas; Appuswamy, Raja
Submitted to bioRxiv, 6 October 2022

The surge in demand for cost-effective, durable long-term archival media, coupled with density limitations of contemporary magnetic media, has resulted in synthetic DNA emerging as a promising new alternative. Today, the limiting factor for DNA-based data archival is the cost of writing (synthesis) and reading (sequencing) DNA. Newer techniques that reduce the cost often do so at the expense of reliability, as they introduce complex, technology-specific error patterns. In order to deal with such errors, it is important to design efficient pipelines that can carefully use redundancy to mask errors without amplifying overall cost. In this paper, we present OligoArchive-DSM (OA-DSM), an end-to-end DNA archival pipeline that can provide error-tolerant data storage at low read/write costs. Central to OA-DSM is a database-inspired columnar encoding technique that makes it possible to improve efficiency by enabling integrated decoding and consensus calling during data restoration.


DOI
HAL
Type:
Journal
Date:
2022-10-06
Department:
Data Science
Eurecom Ref:
7078
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Submitted to bioRxiv, 6 October 2022 and is available at : https://doi.org/10.1101/2022.10.06.511077

PERMALINK : https://www.eurecom.fr/publication/7078