gapped alignment Latest Research Papers

Fast and memory efficient approach for mapping NGS reads to a reference genome

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019500082 ◽

2019 ◽

Vol 17 (02) ◽

pp. 1950008 ◽

Cited By ~ 3

Author(s):

Sanjeev Kumar ◽

Suneeta Agarwal ◽

Ranvijay

Keyword(s):

Reference Genome ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Efficient Approach ◽

Gapped Alignment ◽

New Generation Sequencing ◽

Genome Space ◽

New Generation ◽

Burrows Wheeler Transform ◽

Generation Sequencing

New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core step in Next-generation sequencing data analysis such as genetic variation and genome re-sequencing etc. Therefore there is a need of a new approach, efficient with respect to memory as well as time to align these enormous reads with the reference genome. Existing techniques such as MAQ, Bowtie, BWA, BWBBLE, Subread, Kart, and Minimap2 require huge memory for whole reference genome indexing and reads alignment. Gapped alignment versions of these techniques are also 20–40% slower than their respective normal versions. In this paper, an efficient approach: WIT for reference genome indexing and reads alignment using Burrows–Wheeler Transform (BWT) and Wavelet Tree (WT) is proposed. Both exact and approximate alignments are possible by it. Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing. For indexing, the reference genome space required by WIT is 0.6[Formula: see text]N (N is the size of reference genome) whereas existing techniques BWA, Subread, Kart, and Minimap2 require space in between 1.25[Formula: see text]N to 5[Formula: see text]N. Experimentally, it is also observed that even using such small index size alignment time of proposed approach is comparable in comparison to BWA, Subread, Kart, and Minimap2. Other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2. The source code of the proposed approach WIT is available at http://www.algorithm-skg.com/wit/home.html .

Download Full-text

Maximal homology alignment: A new method based on two-dimensional homology

10.1101/593228 ◽

2019 ◽

Author(s):

Al Erives

Keyword(s):

Sequence Alignment ◽

Current Method ◽

Two Dimensions ◽

Nucleotide Position ◽

Two Dimensional ◽

Replication Slippage ◽

Biologically Relevant ◽

Dna Sequence Alignment ◽

Gapped Alignment ◽

Binary State

ABSTRACTMaximal homology alignment is a new biologically-relevant approach to DNA sequence alignment that maps the internal dispersed microhomology of individual sequences onto two dimensions. It departs from the current method of gapped alignment, which uses a simplified binary state model of nucleotide position. In gapped alignment nucleotide positions have either no relationship (1-to-None) or else orthological relationship (1-to-1) with nucleotides in other sequences. Maximal homology alignment, however, allows additional states such as 1-to-Many and Many-to-Many, thus modeling both orthological and paralogical relationships, which together comprise the main homology types. Maximal homology alignment collects dispersed microparalogy into the same alignment columns on multiple rows, and thereby generates a two-dimensional representation of a single sequence. Sequence alignment then proceeds as the alignment of two-dimensional topological objects. The operations of producing and aligning two-dimensional auto-alignments motivate a need for tests of two-dimensional homological integrity. Here, I work out and implement basic principles for computationally testing the two dimensions of positional homology, which are inherent to biological sequences due to replication slippage and related errors. I then show that maximal homology alignment is more informative than gapped alignment in modeling the evolution of genetic sequences. In general, MHA is more suited when small insertions and deletions predominantly originate as local microparalogy. These results show that both conserved and non-conserved genomic sequences are enriched with a signature of replication slippage relative to their random permutations.

Download Full-text

Genetic sequences are two-dimensional

10.1101/299867 ◽

2018 ◽

Cited By ~ 1

Author(s):

Albert J Erives

Keyword(s):

Tandem Repeats ◽

Methodological Approach ◽

Regulatory Sequences ◽

Uniform Size ◽

Multiple Sequence ◽

Replication Slippage ◽

Protein Coding ◽

Gapped Alignment ◽

Genetic Sequences ◽

Reading Frames

AbstractIn attempting to align divergent homologs of a conserved developmental enhancer, a flaw in the homology concept embedded in gapped alignment (GA) was discovered. To correct this flaw, we developed a methodological approach called maximal homology alignment (MHA). The goal of MHA is to rescue internal microparalogy of biological sequences rather than to insert a pattern of gaps (null characters), which transform homologous sequences into strings of uniform size (1-dimensional lengths). The core operation in MHA is the “cinch”, whereby inferred tandem microparalogy is represented in multiple rows across the same span of alignment columns. Thus, MHAs have a second (vertical) paralogy dimension, which re-categorizes most indel mutations as replication slippage and attenuates the indel problem. Furthermore, internally-cinched, inferred microparalogy in a self-MHA can later be relaxed to restore uniformity to 2-dimensional widths in a multiple sequence alignment. This de-cinching operation is used as a first resort before artificial null characters are used. We implement MHA in a program called maximal, which is composed of a series of modules for cinching and cyclelizing divergent tandem repeats. In conclusion, we find that the MHA approach is of higher utility than GA in non-protein-coding regulatory sequences, which are unconstrained by codon-based reading frames and are enriched in dense microparalogical content.

Download Full-text

Evaluation of tools for long read RNA-seq splice-aware alignment

10.1101/126656 ◽

2017 ◽

Cited By ~ 1

Author(s):

Krešimir Križanović ◽

Amina Echchiki ◽

Julien Roux ◽

Mile Šikić

Keyword(s):

High Throughput Sequencing ◽

Genetic Research ◽

Error Rates ◽

Rna Seq ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Gapped Alignment ◽

Long Read ◽

Gene Expression Levels

AbstractMotivationHigh–throughput sequencing has transformed the study of gene expression levels through RNA-seq, a technique that is now routinely used by various fields, such as genetic research or diagnostics. The advent of third generation sequencing technologies providing significantly longer reads opens up new possibilities. However, the high error rates common to these technologies set new bioinformatics challenges for the gapped alignment of reads to their genomic origin. In this study, we have explored how currently available RNA-seq splice-aware alignment tools cope with increased read lengths and error rates. All tested tools were initially developed for short NGS reads, but some have claimed support for long PacBio or even ONT MinION reads.ResultsThe tools were tested on synthetic and real datasets from the PacBio and ONT MinION technologies, and both alignment quality and resource usage were compared across tools. The effect of error correction of long reads was explored, both using self-correction and correction with an external short reads dataset. A tool was developed for evaluating RNA-seq alignment results. This tool can be used to compare the alignment of simulated reads to their genomic origin, or to compare the alignment of real reads to a set of annotated transcripts.Our tests show that while some RNA-seq aligners were unable to cope with long error-prone reads, others produced overall good results. We further show that alignment accuracy can be improved using error-corrected reads.Availabilityhttps://github.com/kkrizanovic/[email protected]

Download Full-text

Sequence Alignment on Directed Graphs

10.1101/124941 ◽

2017 ◽

Cited By ~ 2

Author(s):

Kavya Vaddadi ◽

Naveen Sivadasan ◽

Kshitij Tayal ◽

Rajgopal Srinivasan

Keyword(s):

Blow Up ◽

Directed Graphs ◽

Input Sequence ◽

Directed Acyclic Graphs ◽

Sequence Length ◽

Alignment Algorithm ◽

Input Graph ◽

Worst Case ◽

Gapped Alignment ◽

Vertex Set

AbstractGenomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAG). For this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Also, such graph extensions could have considerable blow up in their size and in the worst case the blow up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. In this, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph and its feedback vertex set. We perform experiments to compare against the POA based alignment. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high ‘similarity’ to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.

Download Full-text

ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly

Genome Medicine ◽

10.1186/s13073-015-0251-2 ◽

2015 ◽

Vol 7 (1) ◽

Cited By ~ 25

Author(s):

Rendong Yang ◽

Andrew C. Nelson ◽

Christine Henzler ◽

Bharat Thyagarajan ◽

Kevin A. T. Silverstein

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Indel Detection ◽

Gapped Alignment ◽

Hybrid Framework

Download Full-text

Some Applications to Computational Biology

Hidden Markov Processes ◽

10.23943/princeton/9780691133157.003.0008 ◽

2014 ◽

Author(s):

M. Vidyasagar

Keyword(s):

Amino Acids ◽

Computational Biology ◽

Markov Model ◽

Hidden Markov Model ◽

Markov Processes ◽

Sequence Alignment ◽

Dna Sequences ◽

Hidden Markov ◽

Gapped Alignment ◽

Hidden Markov Processes

This chapter considers some applications of Markov processes and hidden Markov processes to computational biology. It introduces three important problems, namely: sequence alignment, the gene-finding problem, and protein classification. After providing an overview of some relevant aspects of biology, the chapter examines the problem of optimal gapped alignment between two sequences. This is a way to detect similarity between two sequences over a common alphabet, such as the four-symbol alphabet of nucleotides, or the 20-symbol alphabet of amino acids. The chapter proceeds by discussing some widely used algorithms for finding genes from DNA sequences (genomes), including the GLIMMER algorithm and the GENSCAN algorithm. Finally, it describes a special type of hidden Markov model termed profile hidden Markov model, which is commonly used to classify proteins into a small number of groups.

Download Full-text

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference

Bioinformatics ◽

10.1093/bioinformatics/btt381 ◽

2013 ◽

Vol 29 (18) ◽

pp. 2292-2299 ◽

Cited By ~ 29

Author(s):

Naoki Nariai ◽

Osamu Hirose ◽

Kaname Kojima ◽

Masao Nagasaki

Keyword(s):

Bayesian Inference ◽

Estimation Method ◽

Abundance Estimation ◽

Rna Seq ◽

Variational Bayesian ◽

Variational Bayesian Inference ◽

Gapped Alignment

Download Full-text

Path reversal, islands, and the gapped alignment of random sequences

Journal of Applied Probability ◽

10.1239/jap/1101840544 ◽

2004 ◽

Vol 41 (4) ◽

pp. 975-983 ◽

Cited By ~ 2

Author(s):

John L. Spouge

Keyword(s):

Queueing Theory ◽

Point Processes ◽

Stochastic Ordering ◽

Higher Dimensions ◽

Systematic Investigation ◽

Local Alignment ◽

Spatial Averaging ◽

Efficient Simulation ◽

Gapped Alignment ◽

Stationary Point Processes

In bioinformatics, the notion of an ‘island’ enhances the efficient simulation of gapped local alignment statistics. This paper generalizes several results relevant to gapless local alignment statistics from one to higher dimensions, with a particular eye to applications in gapped alignment statistics. For example, reversal of paths (rather than of discrete time) generalizes a distributional equality, from queueing theory, between the Lindley (local sum) and maximum processes. Systematic investigation of an ‘ownership’ relationship among vertices in ℤ2 formalizes the notion of an island as a set of vertices having a common owner. Predictably, islands possess some stochastic ordering and spatial averaging properties. Moreover, however, the average number of vertices in a subcritical stationary island is 1, generalizing a theorem of Kac about stationary point processes. The generalization leads to alternative ways of simulating some island statistics.

Download Full-text

Path reversal, islands, and the gapped alignment of random sequences

Journal of Applied Probability ◽

10.1017/s0021900200020751 ◽

2004 ◽

Vol 41 (04) ◽

pp. 975-983 ◽

Cited By ~ 1

Author(s):

John L. Spouge

Keyword(s):

Queueing Theory ◽

Point Processes ◽

Stochastic Ordering ◽

Higher Dimensions ◽

Systematic Investigation ◽

Local Alignment ◽

Spatial Averaging ◽

Efficient Simulation ◽

Gapped Alignment ◽

Stationary Point Processes

In bioinformatics, the notion of an ‘island’ enhances the efficient simulation of gapped local alignment statistics. This paper generalizes several results relevant to gapless local alignment statistics from one to higher dimensions, with a particular eye to applications in gapped alignment statistics. For example, reversal of paths (rather than of discrete time) generalizes a distributional equality, from queueing theory, between the Lindley (local sum) and maximum processes. Systematic investigation of an ‘ownership’ relationship among vertices in ℤ2 formalizes the notion of an island as a set of vertices having a common owner. Predictably, islands possess some stochastic ordering and spatial averaging properties. Moreover, however, the average number of vertices in a subcritical stationary island is 1, generalizing a theorem of Kac about stationary point processes. The generalization leads to alternative ways of simulating some island statistics.

Download Full-text

gapped alignment
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast and memory efficient approach for mapping NGS reads to a reference genome

Maximal homology alignment: A new method based on two-dimensional homology

Genetic sequences are two-dimensional

Evaluation of tools for long read RNA-seq splice-aware alignment

Sequence Alignment on Directed Graphs

ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly

Some Applications to Computational Biology

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference

Path reversal, islands, and the gapped alignment of random sequences

Path reversal, islands, and the gapped alignment of random sequences

Export Citation Format

gapped alignmentRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast and memory efficient approach for mapping NGS reads to a reference genome

Maximal homology alignment: A new method based on two-dimensional homology

Genetic sequences are two-dimensional

Evaluation of tools for long read RNA-seq splice-aware alignment

Sequence Alignment on Directed Graphs

ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly

Some Applications to Computational Biology

TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference

Path reversal, islands, and the gapped alignment of random sequences

Path reversal, islands, and the gapped alignment of random sequences

gapped alignment
Recently Published Documents