VLDB 2023, 49th International Conference on Very Large Data Bases, 28 August-1 September 2023 , Vancouver, Canada / Vol.16, N°8
Given the growing adoption of AI, cloud data lakes are facing the need to support cost-effective “just-in-case” data archival over long time periods to meet regulatory compliance requirements. Unfortunately, current media technologies suffer from fundamental issues that will soon, if not already, make cost-effective data archival
infeasible. In this paper, we present a vision for redesigning the archival tier of cloud data lakes based on a novel, obsolescencefree storage medium–synthetic DNA. In doing so, we make two contributions: (i) we highlight the challenges in using DNA for data archival and list several open research problems, (ii) we outline OligoArchive-DSM (OA-DSM)–an end-to-endDNAstorage pipeline that we are developing to demonstrate the feasibility of our vision.
© ACM, 2023. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in VLDB 2023, 49th International Conference on Very Large Data Bases, 28 August-1 September 2023 , Vancouver, Canada / Vol.16, N°8 http://dx.doi.org/10.14778/3594512.3594522