Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity

Mapping Intimacies ◽

10.1101/097089 ◽

2016 ◽

Author(s):

Fengchao Yu ◽

Ning Li ◽

Weichuan Yu

Keyword(s):

Mass Spectrum ◽

Computational Complexity ◽

Large Scale ◽

Tandem Mass Spectrum ◽

State Of The Art ◽

Score Function ◽

Identification Problem ◽

Cross Linking ◽

Tandem Mass ◽

Data Set

AbstractChemical cross-linking coupled with mass spectrometry is a powerful tool to study protein-protein interactions and protein conformations. Two linked peptides are ionized and fragmented to produce a tandem mass spectrum. In such an experiment, a tandem mass spectrum contains ions from two peptides. The peptide identification problem becomes a peptide-peptide pair identification problem. Currently, most existing tools don’t search all possible pairs due to the quadratic time complexity. Consequently, a significant percentage of linked peptides are missed. In our earlier work, we developed a tool named ECL to search all pairs of peptides exhaustively. While ECL does not miss any linked peptides, it is very slow due to the quadratic computational complexity, especially when the database is large. Furthermore, ECL uses a score function without statistical calibration, while researchers1,2 have demonstrated that using a statistical calibrated score function can achieve a higher sensitivity than using an uncalibrated one.Here, we propose an advanced version of ECL, named ECL 2.0. It achieves a linear time and space complexity by taking advantage of the additive property of a score function. It can analyze a typical data set containing tens of thousands of spectra using a large-scale database containing thousands of proteins in a few hours. Comparison with other five state-of-the-art tools shows that ECL 2.0 is much faster than pLink, StavroX, ProteinProspector, and ECL. Kojak is the only one tool that is faster than ECL 2.0. But Kojak does not exhaustively search all possible peptide pairs. We also adopt an e-value estimation method to calibrate the original score. Comparison shows that ECL 2.0 has the highest sensitivity among the state-of-the-art tools. The experiment using a large-scale in vivo cross-linking data set demonstrates that ECL 2.0 is the only tool that can find PSMs passing the false discovery rate threshold. The result illustrates that exhaustive search and well calibrated score function are useful to find PSMs from a huge search space.

Download Full-text

Large‐scale tandem mass spectrum clustering using fast nearest neighbor searching

Rapid Communications in Mass Spectrometry ◽

10.1002/rcm.9153 ◽

2021 ◽

Author(s):

Wout Bittremieux ◽

Kris Laukens ◽

William Stafford Noble ◽

Pieter C. Dorrestein

Keyword(s):

Mass Spectrum ◽

Large Scale ◽

Tandem Mass Spectrum ◽

Nearest Neighbor ◽

Tandem Mass ◽

Nearest Neighbor Searching

Download Full-text

A simple method for characterizing a neutral species lost in a metastable decomposition using a tandem mass spectrum of a naturally abundant isotopic ion containing a13C atom

Rapid Communications in Mass Spectrometry ◽

10.1002/(sici)1097-0231(19990915)13:17<1770::aid-rcm705>3.0.co;2-2 ◽

1999 ◽

Vol 13 (17) ◽

pp. 1770-1772 ◽

Cited By ~ 1

Author(s):

Susumu Tajima ◽

Osamu Sekiguchi ◽

Yukiyasu Kowase ◽

Satoshi Nakajima

Keyword(s):

Mass Spectrum ◽

Tandem Mass Spectrum ◽

Tandem Mass ◽

Neutral Species ◽

Simple Method ◽

Metastable Decomposition

Download Full-text

Feature Selection for Tandem Mass Spectrum Quality Assessment via Sparse Logistical Regression

2009 3rd International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2009.5162855 ◽

2009 ◽

Author(s):

Jiarui Ding ◽

Fang-Xiang Wu

Keyword(s):

Feature Selection ◽

Mass Spectrum ◽

Quality Assessment ◽

Tandem Mass Spectrum ◽

Tandem Mass ◽

Selection For ◽

Logistical Regression

Download Full-text

Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum/peptide sequence false match frequencies

Bioinformatics ◽

10.1093/bioinformatics/btm267 ◽

2007 ◽

Vol 23 (17) ◽

pp. 2210-2217 ◽

Cited By ~ 29

Author(s):

Jian Feng ◽

Daniel Q. Naiman ◽

Bret Cooper

Keyword(s):

Pattern Recognition ◽

Mass Spectrum ◽

Tandem Mass Spectrum ◽

Peptide Sequence ◽

Tandem Mass ◽

False Match ◽

Statistical Framework

Download Full-text

Memetic algorithm with route decomposing for periodic capacitated arc routing problem

10.26686/wgtn.14344067 ◽

2021 ◽

Author(s):

Yuzhou Zhang ◽

Yi Mei ◽

Ke Tang ◽

Keqin Jiang

Keyword(s):

Road Network ◽

Large Scale ◽

Memetic Algorithm ◽

State Of The Art ◽

Arc Routing ◽

Data Set ◽

Routing Problem ◽

Capacitated Arc Routing Problem ◽

Single Period ◽

Capacitated Arc Routing

In this paper, the Periodic Capacitated Arc Routing Problem (PCARP) is investigated. PCARP is an extension of the well-known CARP from a single period to a multi-period horizon. In PCARP, two objectives are to be minimized. One is the number of required vehicles (nv), and the other is the total cost (tc). Due to the multi-period nature, given the same graph or road network, PCARP can have a much larger solution space than the single-period CARP counterpart. Furthermore, PCARP consists of an additional allocation sub-problem (of the days to serve the arcs), which is interdependent with the routing sub-problem. Although some attempts have been made for solving PCARP, more investigations are yet to be done to further improve their performance especially on large-scale problem instances. It has been shown that optimizing nv and tc separately (hierarchically) is a good way of dealing with the two objectives. In this paper, we further improve this strategy and propose a new Route Decomposition (RD) operator thereby. Then, the RD operator is integrated into a Memetic Algorithm (MA) framework for PCARP, in which novel crossover and local search operators are designed accordingly. In addition, to improve the search efficiency, a hybridized initialization is employed to generate an initial population consisting of both heuristic and random individuals. The MA with RD (MARD) was evaluated and compared with the state-of-the-art approaches on two benchmark sets of PCARP instances and a large data set which is based on a real-world road network. The experimental results suggest that MARD outperforms the compared state-of-the-art algorithms, and improves most of the best-known solutions. The advantage of MARD becomes more obvious when the problem size increases. Thus, MARD is particularly effective in solving large-scale PCARP instances. Moreover, the efficacy of the proposed RD operator in MARD has been empirically verified. Graphical abstract https://ars.els-cdn.com/content/image/1-s2.0-S1568494616304768-fx1_lrg.jpg © This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/

Download Full-text

Tandem mass spectrum of a growth hormone secretagogue: Amide bond cleavage and resultant gas-phase rearrangement

Journal of the American Society for Mass Spectrometry ◽

10.1016/s1044-0305(02)00350-1 ◽

2002 ◽

Vol 13 (4) ◽

pp. 371-377 ◽

Cited By ~ 13

Author(s):

Xue-Zhi Qin

Keyword(s):

Growth Hormone ◽

Mass Spectrum ◽

Gas Phase ◽

Tandem Mass Spectrum ◽

Bond Cleavage ◽

Amide Bond ◽

Tandem Mass ◽

Growth Hormone Secretagogue ◽

Amide Bond Cleavage

Download Full-text

Tandem mass spectrum of a farnesyl transferase inhibitor? gas-phase rearrangements involving imidazole

Journal of Mass Spectrometry ◽

10.1002/jms.192 ◽

2001 ◽

Vol 36 (8) ◽

pp. 911-917 ◽

Cited By ~ 28

Author(s):

Xue-Zhi Qin

Keyword(s):

Mass Spectrum ◽

Gas Phase ◽

Tandem Mass Spectrum ◽

Tandem Mass ◽

Farnesyl Transferase ◽

Farnesyl Transferase Inhibitor

Download Full-text

Peptide Retention Time Prediction Yields Improved Tandem Mass Spectrum Identification for Diverse Chromatography Conditions

Lecture Notes in Computer Science - Research in Computational Molecular Biology ◽

10.1007/978-3-540-71681-5_32 ◽

2007 ◽

pp. 459-472 ◽

Cited By ~ 5

Author(s):

Aaron A. Klammer ◽

Xianhua Yi ◽

Michael J. MacCoss ◽

William Stafford Noble

Keyword(s):

Mass Spectrum ◽

Retention Time ◽

Tandem Mass Spectrum ◽

Tandem Mass ◽

Retention Time Prediction ◽

Time Prediction

Download Full-text

The Linoleic Acid Content of the Stratum Corneum of Ichthyotic Golden Retriever Dogs Is Reduced as Compared to Healthy Dogs and a Significant Part Is Oxidized in Both Free and Esterified Forms

Metabolites ◽

10.3390/metabo11120803 ◽

2021 ◽

Vol 11 (12) ◽

pp. 803

Author(s):

Iuliana Popa ◽

Audrey Solgadi ◽

Didier Pin ◽

Adrian L. Watson ◽

Marek Haftek ◽

...

Keyword(s):

Mass Spectrometry ◽

Liquid Chromatography ◽

Linoleic Acid ◽

Mass Spectrum ◽

Stratum Corneum ◽

Electrospray Ionization ◽

Tandem Mass Spectrum ◽

Octadecadienoic Acid ◽

Free Form ◽

Tandem Mass

Golden Retrievers may suffer from Pnpl1-related inherited ichthyosis. Our study shows that in the stratum corneum (SC) of ichthyotic dogs, linoleic acid (LA) is also present in the form of 9-keto-octadecadienoic acid (9-KODE) instead of the acylacid form as in normal dogs. The fatty acids purified from SC strips (LA, acylacids) were characterized by liquid chromatography-tandem mass spectrometry (LC-MS) and atmospheric pressure chemical ionization (APCI). Electrospray ionization (ESI) and MS2(MS/MS Tandem mass spectrum/spectra)/M3 (MS/MS/MS Tandem mass spectrum/spectra) fragmentation indicated the positions of the double bonds in 9-KODE. We showed that ichthyotic dogs have a threefold lower LA content in the form of acylacids. The MS2 fragmentation of acyl acids showed in some peaks the presenceof an ion at the m/z 279, instead of an ion at m/z 293 which is characteristic of LA. The detected variant was identified upon MS3 fragmentation as 9-keto-octadecadienoic acid (9-KODE), and the level of this keto-derivative was increased in ichthyotic dogs. We showed by the APCI that such keto forms of LA are produced from hydroperoxy-octadecadienoic acids (HpODE) upon dehydration. In conclusion, the free form of 9-KODE was detected in ichthyotic SC up to fivefold as compared to unaffected dogs, and analyses by HPLC (High performance liquid chromatography) and ESI-MS (Electrospray Ionization-Mass Spectrometry) indicated its production via dehydration of native 9-HpODE.

Download Full-text

Property valuation by machine learning for the Norwegian real estate market

10.14293/s2199-1006.1.sor-.pp0tp9i.v1 ◽

2021 ◽

Author(s):

Amandip Sangha

Keyword(s):

Machine Learning ◽

Real Estate ◽

Large Scale ◽

State Of The Art ◽

Regression Tree ◽

Real Estate Market ◽

Research Literature ◽

Data Set ◽

Boosted Regression Tree ◽

Market Data

We train a machine learning model on large data set for predicting property values in the Norwegian real estate market. Our model is a gradient boosted regression tree. The data set is the largest market data set of properties in Norway considered in the research literature. We achieve state of the art accuracy. A large scale market data set of real estate properties is collected from sales and rental ads on publicly accessible internet sites. The property advertisements show property features and appraisal values made by real estate brokers. We train a gradient boosted regression tree model on selected features of the data set. This is a multivariate regression model built with supervised learning. We do 5-fold cross validation to assess the accuracy and robustness of the model. The gradient boosted regression tree models are already known to give the best prediction accuracy on real estate price valuations. We achieve state of the art pre- diction accuracy using a minimal feature set and only publicly and freely available sales advertisement data. The novelty of our work lies in the fact that we use a minimal feature set in our model, and we have the largest data set in the research literature, and moreover we have used only freely and publicly accessible data which are simple to obtain. This shows that useful estimation models with high accuracy can be built with quite simple resources.

Download Full-text