PIPI: PTM-Invariant Peptide Identification Using Coding Method

Mapping Intimacies ◽

10.1101/055806 ◽

2016 ◽

Cited By ~ 1

Author(s):

Fengchao Yu ◽

Ning Li ◽

Weichuan Yu

Keyword(s):

Amino Acids ◽

Dynamic Programming Algorithm ◽

Computational Cost ◽

Peptide Identification ◽

Real Data ◽

Search Space ◽

Database Search ◽

Programming Algorithm ◽

Post Translational Modification ◽

Coding Method

AbstractIn computational proteomics, identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost increases exponentially with respect to the number of modifiable amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible modification patterns. Existing tools (e.g., MS-Alignment, ProteinProspector, and MODa) avoid enumerating modification patterns in database search by using an alignment-based approach to localize and characterize modified amino acids. This approach avoids enumerating all possible modification patterns in a database search. However, due to the large search space and PTM localization issue, the sensitivity of these tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI first codes peptide sequences into Boolean vectors and converts experimental spectra into real-valued vectors. Then, it finds the top 10 peptide-coded vectors for each spectrum-coded vector. After that, PIPI uses a dynamic programming algorithm to localize and characterize modified amino acids. Simulations and real data experiments have shown that PIPI outperforms existing tools by identifying more peptide-spectrum matches (PSMs) and reporting fewer false positives. It also runs much faster than existing tools when the database is large.

Download Full-text

PSRna: Prediction of small RNA secondary structures based on reverse complementary folding method

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016430010 ◽

2016 ◽

Vol 14 (04) ◽

pp. 1643001 ◽

Cited By ~ 1

Author(s):

Jin Li ◽

Chengzhen Xu ◽

Lei Wang ◽

Hong Liang ◽

Weixing Feng ◽

...

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Small Rna ◽

Small Rnas ◽

Dynamic Programming Algorithm ◽

Real Data ◽

Secondary Structures ◽

Minimum Free Energy ◽

Programming Algorithm ◽

Rna Secondary Structures

Prediction of RNA secondary structures is an important problem in computational biology and bioinformatics, since RNA secondary structures are fundamental for functional analysis of RNA molecules. However, small RNA secondary structures are scarce and few algorithms have been specifically designed for predicting the secondary structures of small RNAs. Here we propose an algorithm named “PSRna” for predicting small-RNA secondary structures using reverse complementary folding and characteristic hairpin loops of small RNAs. Unlike traditional algorithms that usually generate multi-branch loops and 5[Formula: see text] end self-folding, PSRna first estimated the maximum number of base pairs of RNA secondary structures based on the dynamic programming algorithm and a path matrix is constructed at the same time. Second, the backtracking paths are extracted from the path matrix based on backtracking algorithm, and each backtracking path represents a secondary structure. To improve accuracy, the predicted RNA secondary structures are filtered based on their free energy, where only the secondary structure with the minimum free energy was identified as the candidate secondary structure. Our experiments on real data show that the proposed algorithm is superior to two popular methods, RNAfold and RNAstructure, in terms of sensitivity, specificity and Matthews correlation coefficient (MCC).

Download Full-text

A Family of Scheduling Algorithms for Hybrid Parallel Platforms

International Journal of Foundations of Computer Science ◽

10.1142/s012905411850003x ◽

2018 ◽

Vol 29 (01) ◽

pp. 63-90 ◽

Cited By ~ 3

Author(s):

Safia Kedad-Sidhoum ◽

Florence Monna ◽

Grégory Mounié ◽

Denis Trystram

Keyword(s):

Graphics Processing Units ◽

Dynamic Programming Algorithm ◽

Computational Cost ◽

General Purpose ◽

Parallel Applications ◽

Programming Algorithm ◽

Approximation Bound ◽

Computing Platforms ◽

Graphics Processing ◽

Independent Tasks

More and more parallel computing platforms are built upon hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like General Purpose Graphics Processing Units (GPGPUs). We present in this paper a new method for scheduling efficiently parallel applications with [Formula: see text] CPUs and [Formula: see text] GPGPUs, where each task of the application can be processed either on an usual core (CPU) or on a GPGPU. We consider the problem of scheduling [Formula: see text] independent tasks with the objective to minimize the time for completing the whole application (makespan). This problem is NP-hard, thus, we present two families of approximation algorithms that can achieve approximation ratios of [Formula: see text] or [Formula: see text] for any integer [Formula: see text] when only one GPGPU is considered, and [Formula: see text] or [Formula: see text] for [Formula: see text] GPGPUs, where [Formula: see text] is an arbitrary small value which corresponds to the target accuracy of a binary search. The proposed method is based on a dual approximation scheme that uses a dynamic programming algorithm. The associated computational costs are for the first (resp. second) family in [Formula: see text] (resp. [Formula: see text]) per step of dual approximation. The greater the value of parameter [Formula: see text], the better the approximation, but the more expensive the computational cost. Finally, we propose a relaxed version of the algorithm which achieves a running time in [Formula: see text] with a constant approximation bound of [Formula: see text]. This last result is compared to the state-of-the-art algorithm HEFT. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes.

Download Full-text

Multiple change-points detection in high dimension

Random Matrices Theory and Application ◽

10.1142/s201032631950014x ◽

2019 ◽

Vol 08 (04) ◽

pp. 1950014 ◽

Cited By ~ 1

Author(s):

Yunlong Wang ◽

Changliang Zou ◽

Zhaojun Wang ◽

Guosheng Yin

Keyword(s):

Dynamic Programming Algorithm ◽

Null Distribution ◽

Real Data ◽

Information Criterion ◽

Change Point Detection ◽

New Method ◽

Change Points ◽

Estimation Accuracy ◽

Programming Algorithm ◽

Order Structure

Change-point detection is an integral component of statistical modeling and estimation. For high-dimensional data, classical methods based on the Mahalanobis distance are typically inapplicable. We propose a novel testing statistic by combining a modified Euclidean distance and an extreme statistic, and its null distribution is asymptotically normal. The new method naturally strikes a balance between the detection abilities for both dense and sparse changes, which gives itself an edge to potentially outperform existing methods. Furthermore, the number of change-points is determined by a new Schwarz’s information criterion together with a pre-screening procedure, and the locations of the change-points can be estimated via the dynamic programming algorithm in conjunction with the intrinsic order structure of the objective function. Under some mild conditions, we show that the new method provides consistent estimation with an almost optimal rate. Simulation studies show that the proposed method has satisfactory performance of identifying multiple change-points in terms of power and estimation accuracy, and two real data examples are used for illustration.

Download Full-text

REDUCING THE SEARCH SPACE AND TIME COMPLEXITY OF NEEDLEMAN-WUNSCH ALGORITHM (GLOBAL ALIGNMENT) AND SMITH-WATERMAN ALGORITHM (LOCAL ALIGNMENT) FOR DNA SEQUENCE ALIGNMENT

Jurnal Teknologi ◽

10.11113/jt.v77.6564 ◽

2015 ◽

Vol 77 (20) ◽

Cited By ~ 1

Author(s):

F. N. Muhamad ◽

R. B. Ahmad ◽

S. Mohd. Asi ◽

M. N. Murad

Keyword(s):

Dynamic Programming ◽

Dna Sequences ◽

Sequence Comparison ◽

Dynamic Programming Algorithm ◽

Search Space ◽

Programming Algorithm ◽

Local Alignment ◽

Global Alignment ◽

Main Research ◽

Dna Sequence Alignment

The fundamental procedure of analyzing sequence content is sequence comparison. Sequence comparison can be defined as the problem of finding which parts of the sequences are similar and which parts are different, namely comparing two sequences to identify similarities and differences between them. A typical approach to solve this problem is to find a good and reasonable alignment between the two sequences. The main research in this project is to align the DNA sequences by using the Needleman-Wunsch algorithm for global alignment and Smith-Waterman algorithm for local alignment based on the Dynamic Programming algorithm. The Dynamic Programming Algorithm is guaranteed to find optimal alignment by exploring all possible alignments and choosing the best through the scoring and traceback techniques. The algorithms proposed and evaluated are to reduce the gaps in aligning sequences as well as the length of the sequences aligned without compromising the quality or correctness of results. In order to verify the accuracy and consistency of measurements obtained in Needleman-Wunsch and Smith-Waterman algorithms the data is compared with Emboss (global) and Emboss (local) with 600 strands test data.

Download Full-text

A Hybrid Evolutionary Algorithm for Wheat Blending Problem

The Scientific World JOURNAL ◽

10.1155/2014/967254 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13

Author(s):

Xiang Li ◽

Mohammad Reza Bonyadi ◽

Zbigniew Michalewicz ◽

Luigi Barone

Keyword(s):

Linear Programming ◽

Evolutionary Algorithm ◽

Heuristic Method ◽

Real Data ◽

Search Space ◽

Programming Algorithm ◽

Hybrid Evolutionary Algorithm ◽

Linear Programming Algorithm ◽

Standard Linear ◽

Blending Problem

This paper presents a hybrid evolutionary algorithm to deal with the wheat blending problem. The unique constraints of this problem make many existing algorithms fail: either they do not generate acceptable results or they are not able to complete optimization within the required time. The proposed algorithm starts with a filtering process that follows predefined rules to reduce the search space. Then the linear-relaxed version of the problem is solved using a standard linear programming algorithm. The result is used in conjunction with a solution generated by a heuristic method to generate an initial solution. After that, a hybrid of an evolutionary algorithm, a heuristic method, and a linear programming solver is used to improve the quality of the solution. A local search based posttuning method is also incorporated into the algorithm. The proposed algorithm has been tested on artificial test cases and also real data from past years. Results show that the algorithm is able to find quality results in all cases and outperforms the existing method in terms of both quality and speed.

Download Full-text

EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences

Nucleic Acids Research ◽

10.1093/nar/gkz287 ◽

2019 ◽

Vol 47 (13) ◽

pp. e77-e77

Author(s):

Xinzhou Ge ◽

Haowen Zhang ◽

Lingjue Xie ◽

Wei Vivian Li ◽

Soo Bin Kwon ◽

...

Keyword(s):

Dynamic Programming Algorithm ◽

Real Data ◽

Chromatin State ◽

Programming Algorithm ◽

Local Alignment ◽

Alignment Algorithm ◽

Genome Wide ◽

Bioinformatic Tool ◽

Cell Type Specific ◽

Nih Roadmap

AbstractThe availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.

Download Full-text

Chinese NER with Height-Limited Constituent Parsing

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017160 ◽

2019 ◽

Vol 33 ◽

pp. 7160-7167 ◽

Cited By ~ 1

Author(s):

Rui Wang ◽

Xin Xin ◽

Wei Chang ◽

Kun Ming ◽

Biao Li ◽

...

Keyword(s):

Conditional Random Fields ◽

Dynamic Programming Algorithm ◽

Ground Truth ◽

Search Space ◽

Joint Model ◽

Sentence Length ◽

Entity Recognition ◽

Programming Algorithm ◽

Ground Truth Data ◽

Pos Tagging

In this paper, we investigate how to improve Chinese named entity recognition (NER) by jointly modeling NER and constituent parsing, in the framework of neural conditional random fields (CRF). We reformulate the parsing task to heightlimited constituent parsing, by which the computational complexity can be significantly reduced, and the majority of phrase-level grammars are retained. Specifically, an unified model of neural semi-CRF and neural tree-CRF is proposed, which simultaneously conducts word segmentation, part-ofspeech (POS) tagging, NER, and parsing. The challenge comes from how to train and infer the joint model, which has not been solved previously. We design a dynamic programming algorithm for both training and inference, whose complexity is O(n·4h), where n is the sentence length and h is the height limit. In addition, we derive a pruning algorithm for the joint model, which further prunes 99.9% of the search space with 2% loss of the ground truth data. Experimental results on the OntoNotes 4.0 dataset have demonstrated that the proposed model outperforms the state-of-the-art method by 2.79 points in the F1-measure.

Download Full-text

IntroSpect: motif-guided immunopeptidome database building tool to improve the sensitivity of HLA binding peptide identification

10.1101/2021.08.02.454768 ◽

2021 ◽

Author(s):

Le Zhang ◽

Geng Liu ◽

Guixue Hou ◽

Haitao Xiang ◽

Xi Zhang ◽

...

Keyword(s):

Peptide Identification ◽

Hla Class I ◽

Search Space ◽

Database Search ◽

Distinct Advantage ◽

High Confidence ◽

Peptide Motifs ◽

Hla Binding ◽

Very High ◽

Validation Experiments

Although database search tools originally developed for shotgun proteome have been widely used in immunopeptidomic mass spectrometry identifications, they have been reported to achieve undesirably low sensitivities and/or high false positive rates as a result of the hugely inflated search space caused by the lack of specific enzymic digestions in immunopeptidome. To overcome such a problem, we have developed a motif-guided immunopeptidome database building tool named IntroSpect, which is designed to first learn the peptide motifs from high confidence hits in the initial search and then build a targeted database for refined search. Evaluated on three representative HLA class I datasets, IntroSpect can improve the sensitivity by an average of 80% comparing to conventional searches with unspecific digestions while maintaining a very high accuracy (~96%) as confirmed by synthetic validation experiments. A distinct advantage of IntroSpect is that it does not depend on any external HLA data so that it performs equally well on both well-studied and poorly-studied HLA types, unlike a previously developed method SpectMHC. We have also designed IntroSpect to keep a global FDR that can be conveniently controlled, similar to conventional database search engines. Finally, we demonstrate the practical value of IntroSpect by discovering neoantigens from MS data directly. IntroSpect is freely available at https://github.com/BGI2016/IntroSpect.

Download Full-text

APIR: a universal FDR-control framework for boosting peptide identification power by aggregating multiple proteomics database search algorithms

10.1101/2021.09.08.459494 ◽

2021 ◽

Author(s):

Yiling Elaine Chen ◽

Kyla Woyshner ◽

MeiLu McDermott ◽

Antigoni Manousopoulou ◽

Scott Ficarro ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

Peptide Identification ◽

Real Data ◽

Search Algorithms ◽

Database Search ◽

Biomedical Data ◽

Sequencing Data ◽

High Throughput Analysis ◽

Complex Protein

AbstractAdvances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with guaranteed control on the false discovery rate (FDR) and guaranteed increase in the identified peptides. To fill in this gap, we propose a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under a target FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex protein standard shows that APIR outpowers individual database search algorithms and guarantees the FDR control. Real data studies show that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. Note that the APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data.

Download Full-text

Structural Join Algorithm of XML Query Based on Exhaustive Dynamic Programming

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.1527 ◽

2012 ◽

Vol 198-199 ◽

pp. 1527-1530

Author(s):

Xue Min Zhang ◽

Xiao Wen Chen ◽

Jia Lin Jiao

Keyword(s):

Dynamic Programming ◽

Dynamic Programming Algorithm ◽

Optimal Solution ◽

Search Space ◽

Programming Algorithm ◽

Structural Join ◽

Join Algorithm ◽

Local Optimal Solution ◽

Global Optimal ◽

Basic Ideas

Using the advantages of exhaustive dynamic programming algorithm, on the basic ideas of the global optimal solution is derived based on local optimal solution, this paper propose a new structural selection join algorithm. The algorithm connects to the sub-tree, and then connects to the structure of the whole. Though not guaranteed optimal solution, this algorithm can improve much in the time complexity, reduce the search space and improve efficiency.

Download Full-text