High-Fidelity (HiFi) sequencing technologies have revolutionized genomics by producing long, highly accurate fragments (or reads), which are essential for high-quality genome assembly and other downstream tasks. However, existing tools on HiFi reads, such as k-mer (subsequences of length k) counting, all-versus-all overlap detection between reads, and de novo assembly (assembly of reads into a genome without a reference sequence), usually take a long processing time or consume a large amount of memory due to their data-intensive computation. To improve the performance of these tasks, this thesis proposes new algorithms and systems that develop effective heuristics, parallelize the processing, as well as utilize memory effectively. First, we introduce RapidGKC, a GPU-accelerated k-mer counting system that optimizes the encoding and partitioning of variable-length genomic data to enable highly parallel processing. With CPU-GPU co-processing, RapidGKC achieves significant speedups over state-of-the-art CPU-based and GPU-accelerated methods. Second, we develop RapidAVA, a CPU-based multi-threaded method for all-vs-all overlap detection in HiFi reads. It exploits the high accuracy of HiFi data to select minimizers (common subsequences with certain features among reads) efficiently, speeds up the chaining and extension of k-mers, and optimizes the alignment procedure with effective heuristics. As a result, RapidAVA yields up to 5.8× and 2.3× speedups over two represenative tools, Minimap2 and Hifiasm, respectively, while reducing the peak memory consumption to as low as 13–88% of theirs. Finally, we present RapidAsm, a CPU-based multi-threaded de novo assembler for HiFi reads. It optimzes all three stages of de novo assembly - overlap detection, layout construction, and consensus generation - with novel heuristic strategies, and produces high-quality assembly results at a speed 1.5× to 10× faster than mainstream HiFi read assemblers and 6–75% of the memory consumed by Hifiasm. Furthemore, all of our software tools are publicly available, aiming to enhance the efficiency of HiFi data analysis and enable large-scale genomic studies.
| Date of Award | 2025 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Qiong LUO (Supervisor) & Yangqiu SONG (Supervisor) |
|---|
Efficient HiFi Read De Novo Assembly
CHENG, Y. (Author). 2025
Student thesis: Doctoral thesis