scholarly journals Improving prediction of compound function from chemical structure using chemical-genetic networks

2017 ◽  
Author(s):  
Hamid Safizadeh ◽  
Scott W. Simpkins ◽  
Justin Nelson ◽  
Chad L. Myers

ABSTRACTThe drug discovery process can be significantly improved through understanding how the structure of chemical compounds relates to their function. A common paradigm that has been used to filter and prioritize compounds is ligand-based virtual screening, where large libraries of compounds are queried for high structural similarity to a target molecule, with the assumption that structural similarity is predictive of similar biological activity. Although the chemical informatics community has already proposed a wide range of structure descriptors and similarity coefficients, a major challenge has been the lack of systematic and unbiased benchmarks for biological activity that covers a broad range of targets to definitively assess the performance of the alternative approaches.We leveraged a large set of chemical-genetic interaction data from the yeast Saccharomyces cerevisiae that our labs have recently generated, covering more than 13,000 compounds from the RIKEN NPDepo and several NCI, NIH, and GlaxoSmithKline (GSK) compound collections. Supportive of the idea that chemical-genetic interaction data provide an unbiased proxy for biological functions, we found that many commonly used structural similarity measures were able to predict the compounds that exhibited similar chemical-genetic interaction profiles, although these measures did exhibit significant differences in performance. Using the chemical-genetic interaction profiles as a basis for our evaluation, we performed a systematic benchmarking of 10 different structure descriptors, each combined with 12 different similarity coefficients. We found that the All-Shortest Path (ASP) structure descriptor paired with the Braun-Blanquet similarity coefficient provided superior performance that was robust across several different compound collections.We further describe a machine learning approach that improves the ability of the ASP metric to capture biological activity. We used the ASP fingerprints as input for several supervised machine learning models and the chemical-genetic interaction profiles as the standard for learning. We found that the predictive power of the ASP fingerprints (as well as several other descriptors) could be substantially improved by using support vector machines. For example, on held-out data, we measured a 5-fold improvement in the recall of biologically similar compounds at a precision of 50% based upon the ASP fingerprints. Our results generally suggest that using high-dimensional chemical-genetic data as a basis for refining chemical structure descriptors can be a powerful approach to improving prediction of biological function from structure.

2017 ◽  
Vol 34 (7) ◽  
pp. 1251-1252 ◽  
Author(s):  
Justin Nelson ◽  
Scott W Simpkins ◽  
Hamid Safizadeh ◽  
Sheena C Li ◽  
Jeff S Piotrowski ◽  
...  

2003 ◽  
Vol 22 (1) ◽  
pp. 62-69 ◽  
Author(s):  
Ainslie B Parsons ◽  
Renée L Brost ◽  
Huiming Ding ◽  
Zhijian Li ◽  
Chaoying Zhang ◽  
...  

2009 ◽  
Vol 5 (4) ◽  
pp. e1000347 ◽  
Author(s):  
Gregory W. Carter ◽  
David J. Galas ◽  
Timothy Galitski

2010 ◽  
Vol 6 (1) ◽  
pp. 379 ◽  
Author(s):  
Alexis Battle ◽  
Martin C Jonikas ◽  
Peter Walter ◽  
Jonathan S Weissman ◽  
Daphne Koller

2017 ◽  
Author(s):  
Raamesh Deshpande ◽  
Justin Nelson ◽  
Scott W. Simpkins ◽  
Michael Costanzo ◽  
Jeff S. Piotrowski ◽  
...  

Large-scale genetic interaction screening is a powerful approach for unbiased characterization of gene function and understanding systems-level cellular organization. While genome-wide screens are desirable as they provide the most comprehensive interaction profiles, they are resource and time-intensive and sometimes infeasible, depending on the species and experimental platform. For these scenarios, optimal methods for more efficient screening while still producing the maximal amount of information from the resulting profiles are of interest.To address this problem, we developed an optimal algorithm, called COMPRESS-GI, which selects a small but informative set of genes that captures most of the functional information contained within genome-wide genetic interaction profiles. The utility of this algorithm is demonstrated through an application of the approach to define a diagnostic mutant set for large-scale chemical genetic screens, where more than 13,000 compound screens were achieved through the increased throughput enabled by the approach. COMPRESS-GI can be broadly applied for directing genetic interaction screens in other contexts, including in species with little or no prior genetic-interaction data.


Author(s):  
Hamid Safizadeh ◽  
Scott W. Simpkins ◽  
Justin Nelson ◽  
Sheena C. Li ◽  
Jeff S. Piotrowski ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document