Scalable and accurate algorithms for computational genomics and dna-based digital storage

Yan, Yiqing

Thesis

Cost reduction and throughput improvement in sequencing technology have resulted in new advances in applications such as precision medicine and DNA-based storage.  However, the sequenced result contains errors. To measure the similarity between  the sequenced result and reference, edit distance is preferred in practice over Hamming distance due to the indels. The primitive edit distance calculation is  quadratic complex. Therefore, sequence similarity analysis is computationally  intensive. In this thesis, we introduce two accurate and scalable sequence  similarity analysis algorithms, i) Accel-Align, a fast sequence mapper and aligner based on the seed–embed–extend methodology, and ii) Motif-Search, an efficient  structure-aware algorithm to recover the information encoded by the composite  motifs from the DNA archive. Then, we use Accel-Align as an efficient tool to  study the random-access design in DNA-based storage.

Detail

Document

HAL

BIBTEX

Type:

Thesis

Date:

2023-04-26

Department:

Data Science

Eurecom Ref:

7258