Hierarchical encoding of JPEG2000-compressed images for DNA data storage

Pic, Xavier; Appuswamy, Raja
DBDS 2025, New Trends in DNA-Based Data Storage Conference, 3-6 June 2025, Prague, Czech Republic

The increase in storage demand and low lifespan of conventional data storage media has transformed long-term archival and preservation into key bottlenecks for the data storage industry. Thus, researchers are now investigating innovative data storage media techniques. DNA molecules, with their high density, long lifespan and low energy needs, are promising candidates for alternative longterm data archival systems. However, current DNA data storage technologies are facing challenges with respect to cost (reading and writing on DNA is expensive) and reliability (reading and writing data is error prone). Thus, data compression and error correction are crucial to scale DNA storage and make it technologically and economically viable. Additionally, the DNA molecules encoding different files are very often stored in the same place, called an oligo pool. For this reason, without random access solutions, it is relatively impractical to decode a specific file from the pool, because all the oligos from all the files need to first be sequenced, which greatly deteriorates the read cost. This paper introduces a solution (Fig. 1) to efficiently encode and store images into DNA molecules, that aims at reducing the read cost necessary to retrieve a resolution-reduced version of an image. This image storage system is based on the Progressive Decoding Functionality of the JPEG2000 codec and can be adapted for any other codec that enables a progressive decoding function. Each resolution layer is encoded into a set of oligos using the Raptor code [1] provided in the JPEG DNA VM software, with primers specific to the resolution layer attached to them. Depending on the desired resolution to be read, the set of oligos to be sequenced and decoded is adjusted accordingly. These oligos will be selected, augmented and sequenced through a PCR process run with the layer specific primers. The ReadUntil functionality of the Nanopore Sequencer can replace the PCR runs. It provides a system to reject at sequencing time the oligos that do not match a given template. This template can be dynamically modified during sequencing, allowing for better automation of the whole layer access process. 


Type:
Conference
City:
Prague
Date:
2025-06-03
Department:
Data Science
Eurecom Ref:
8251

PERMALINK : https://www.eurecom.fr/publication/8251