Scalable and accurate algorithms for computational genomics and dna-based digital storage

Yan, Yiqing
Thesis

Cost reduction and throughput improvement in sequencing technology have resulted in new advances in applications such as precision medicine and DNA-based storage.  However, the sequenced result contains errors. To measure the similarity between  the sequenced result and reference, edit distance is preferred in practice over Hamming distance due to the indels. The primitive edit distance calculation is  quadratic complex. Therefore, sequence similarity analysis is computationally  intensive. In this thesis, we introduce two accurate and scalable sequence  similarity analysis algorithms, i) Accel-Align, a fast sequence mapper and aligner based on the seed–embed–extend methodology, and ii) Motif-Search, an efficient  structure-aware algorithm to recover the information encoded by the composite  motifs from the DNA archive. Then, we use Accel-Align as an efficient tool to  study the random-access design in DNA-based storage.

HAL
Type:
Thesis
Date:
2023-04-26
Department:
Data Science
Eurecom Ref:
7258
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/7258