A Secure Alignment Algorithm for Mapping Short Reads to Human Genome

Yongan Zhao; Xiaofeng Wang; Haixu Tang

doi:10.1089/cmb.2017.0094

MULTIPLE ALIGNMENT OF PROMOTER SEQUENCES FROM HUMAN GENOM

BIOTECHNOLOGY: STATE OF THE ART AND PERSPECTIVES ◽

10.37747/2312-640x-2020-18-256-257 ◽

2020 ◽

pp. 256-257

Keyword(s):

Human Genome ◽

Multiple Alignment ◽

Alignment Algorithm ◽

Promoter Sequences ◽

Multiple Alignments

A new multiple alignment algorithm has been developed. With its help, the alignment of promoter sequences from the human genome is calculated. Based on the calculated multiple alignments, 17 classes of promoter sequences were created.

Download Full-text

A Short Reads Alignment Algorithm Oriented to Massive Data

Current Trends in Computer Science and Mechanical Automation Vol.1 ◽

10.1515/9783110584974-008 ◽

2017 ◽

pp. 49-57

Keyword(s):

Massive Data ◽

Alignment Algorithm ◽

Short Reads

Download Full-text

Resolving the Full Spectrum of Human Genome Variation using Linked-Reads

10.1101/230946 ◽

2017 ◽

Cited By ~ 8

Author(s):

Patrick Marks ◽

Sarah Garcia ◽

Alvaro Martinez Barrio ◽

Kamila Belhocine ◽

Jorge Bernate ◽

...

Keyword(s):

Human Genome ◽

Large Scale ◽

De Novo ◽

Simultaneous Detection ◽

Whole Genome ◽

Structural Variations ◽

Full Spectrum ◽

Short Read ◽

Short Reads ◽

A Genome

AbstractLarge-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short read whole genome sequencing. However, standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the advantages of short reads. Starting from only ∼1ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as ‘Linked-Reads’. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes (Weisenfeld et al. 2017). In this manuscript, we show the advantages of Linked-Reads over standard short read approaches for reference based analysis. We demonstrate the ability of Linked-Reads to reconstruct megabase scale haplotypes and to recover parts of the genome that are typically inaccessible to short reads, including phenotypically important genes such as STRC, SMN1 and SMN2. We demonstrate the ability of both lrWGS and Linked-Read Whole Exome Sequencing (lrWES) to identify complex structural variations, including balanced events, single exon deletions, and single exon duplications. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.

Download Full-text

Rapid and Automated Building of a Human Genome from Aligned Short Reads

SciVee ◽

10.4016/10185.01 ◽

2009 ◽

Keyword(s):

Human Genome ◽

Short Reads

Download Full-text

AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads

Briefings in Bioinformatics ◽

10.1093/bib/bbab022 ◽

2021 ◽

Author(s):

Shien Huang ◽

Xinyu He ◽

Guohua Wang ◽

Ergude Bao

Keyword(s):

De Novo ◽

Consensus Algorithm ◽

Alignment Algorithm ◽

Short Reads ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Stable Performance ◽

Novel Algorithms ◽

Generation Sequencing

Abstract Contigs assembled from the third-generation sequencing long reads are usually more complete than the second-generation short reads. However, the current algorithms still have difficulty in assembling the long reads into the ideal complete and accurate genome, or the theoretical best result [1]. To improve the long read contigs and with more and more fully sequenced genomes available, it could still be possible to use the similar genome-assisted reassembly method [2], which was initially proposed for the short reads making use of a closely related genome (similar genome) to the sequencing genome (target genome). The method aligns the contigs and reads to the similar genome, and then extends and refines the aligned contigs with the aligned reads. Here, we introduce AlignGraph2, a similar genome-assisted reassembly pipeline for the PacBio long reads. The AlignGraph2 pipeline is the second version of AlignGraph algorithm proposed by us but completely redesigned, can be inputted with either error-prone or HiFi long reads, and contains four novel algorithms: similarity-aware alignment algorithm and alignment filtration algorithm for alignment of the long reads and preassembled contigs to the similar genome, and reassembly algorithm and weight-adjusted consensus algorithm for extension and refinement of the preassembled contigs. In our performance tests on both error-prone and HiFi long reads, AlignGraph2 can align 5.7–27.2% more long reads and 7.3–56.0% more bases than some current alignment algorithm and is more efficient or comparable to the others. For contigs assembled with various de novo algorithms and aligned to similar genomes (aligned contigs), AlignGraph2 can extend 8.7–94.7% of them (extendable contigs), and obtain contigs of 7.0–249.6% larger N50 value and 5.2–87.7% smaller number of indels per 100 kbp (extended contigs). With genomes of decreased similarities, AlignGraph2 also has relatively stable performance. The AlignGraph2 software can be downloaded for free from this site: https://github.com/huangs001/AlignGraph2.

Download Full-text

centroFlye: Assembling Centromeres with Long Error-Prone Reads

10.1101/772103 ◽

2019 ◽

Cited By ~ 8

Author(s):

Andrey V. Bzikadze ◽

Pavel A. Pevzner

Keyword(s):

Dark Matter ◽

Human Genome ◽

Short Reads ◽

Evolutionary Studies ◽

Centromere Assembly ◽

Automated Tool ◽

Reference Human Genome

AbstractAlthough variations in centromeres have been linked to cancer and infertility, centromeres still represent the “dark matter of the human genome” and remain an enigma for both biomedical and evolutionary studies. Since centromeres have withstood all previous attempts to develop an automated tool for their assembly and since their assembly using short reads is viewed as intractable, recent efforts attempted to manually assemble centromeres using long error-prone reads. We describe the centroFlye algorithm for centromere assembly using long error-prone reads, apply it for assembling the human X centromere, and use the constructed assembly to gain insights into centromere evolution. Our analysis reveals putative breakpoints in the previous manual reconstruction of the human X centromere and opens a possibility to automatically close the remaining multi-megabase gaps in the reference human genome.

Download Full-text

DNA methylation in satellite repeats disorders

Essays in Biochemistry ◽

10.1042/ebc20190028 ◽

2019 ◽

Vol 63 (6) ◽

pp. 757-771 ◽

Cited By ~ 4

Author(s):

Claire Francastel ◽

Frédérique Magdinier

Keyword(s):

Dna Methylation ◽

Human Genome ◽

Repetitive Dna ◽

Dna Sequences ◽

Satellite Repeats ◽

Tremendous Progress ◽

Genes Encoding ◽

Dna Elements ◽

Near Future

Abstract Despite the tremendous progress made in recent years in assembling the human genome, tandemly repeated DNA elements remain poorly characterized. These sequences account for the vast majority of methylated sites in the human genome and their methylated state is necessary for this repetitive DNA to function properly and to maintain genome integrity. Furthermore, recent advances highlight the emerging role of these sequences in regulating the functions of the human genome and its variability during evolution, among individuals, or in disease susceptibility. In addition, a number of inherited rare diseases are directly linked to the alteration of some of these repetitive DNA sequences, either through changes in the organization or size of the tandem repeat arrays or through mutations in genes encoding chromatin modifiers involved in the epigenetic regulation of these elements. Although largely overlooked so far in the functional annotation of the human genome, satellite elements play key roles in its architectural and topological organization. This includes functions as boundary elements delimitating functional domains or assembly of repressive nuclear compartments, with local or distal impact on gene expression. Thus, the consideration of satellite repeats organization and their associated epigenetic landmarks, including DNA methylation (DNAme), will become unavoidable in the near future to fully decipher human phenotypes and associated diseases.

Download Full-text