Thesis
Cost reduction and throughput improvement in sequencing technology have resulted in new advances in applications such as precision medicine and DNA-based storage. However, the sequenced result contains errors. To measure the similarity between the sequenced result and reference, edit distance is preferred in practice over Hamming distance due to the indels. The primitive edit distance calculation is quadratic complex. Therefore, sequence similarity analysis is computationally intensive. In this thesis, we introduce two accurate and scalable sequence similarity analysis algorithms, i) Accel-Align, a fast sequence mapper and aligner based on the seed–embed–extend methodology, and ii) Motif-Search, an efficient structure-aware algorithm to recover the information encoded by the composite motifs from the DNA archive. Then, we use Accel-Align as an efficient tool to study the random-access design in DNA-based storage.
Type:
Thesis
Date:
2023-04-26
Department:
Data Science
Eurecom Ref:
7258
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also: