CRISPR-GNL: an improved model for predicting CRISPR activity by machine learning and featurization

Mapping Intimacies ◽

10.1101/605790 ◽

2019 ◽

Author(s):

Jun Wang ◽

Xi Xiang ◽

Lixin Cheng ◽

Xiuqing Zhang ◽

Yonglun Luo

Keyword(s):

Prediction Models ◽

Current Model ◽

Target Prediction ◽

Limiting Factors ◽

Supplementary Information ◽

Bioinformatic Tools ◽

Editing Activity ◽

Supplementary Material ◽

Improved Model ◽

Target Activity

ABSTRACTMotivationThe CRISPR/Cas9 system has been broadly used in genetic engineering. However, risks of potential off-targets and the variability of on-target activity among different targets are two limiting factors. Several bioinformatic tools have been developed for CRISPR on-target activity and off-target prediction. However, the general application of the current prediction models is hampered by the great variation among different algorithms.ResultsIn this study, we thoroughly re-analyzed 13 published datasets with eight regression models. We proved that the current model gave very low cross-dataset and cross-species prediction outcome. To overcome these limitations, we have developed an improved model (a generalization score, GNL) based on normalized gene editing activity from 8,101 gRNAs and 2,488 features using Bayesian Ridge Regression model. Our results demonstrated that the GNL model is a better general algorithm for CRISPR on-target activity predictionAvailability and implementationThe prediction scorer is available on GitHub (https://github.com/TerminatorJ/GNL_Scorer).ContactJ.W. ([email protected]) or Y.L. ([email protected])Supplementary InformationSupplementary data are available at Bioinformatics online.

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BioCommons: a robust java library for RNA structural bioinformatics

Bioinformatics ◽

10.1093/bioinformatics/btab069 ◽

2021 ◽

Author(s):

Tomasz Zok

Keyword(s):

Source Code ◽

Structural Bioinformatics ◽

Supplementary Information ◽

Supplementary Data ◽

Bioinformatic Tools ◽

Data Formats ◽

Central Repository ◽

Diverse Data ◽

2D And 3D ◽

Java Library

Abstract Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Numerical Prediction of Submesoscale Flow in the Nocturnal Stable Boundary Layer over Complex Terrain

Monthly Weather Review ◽

10.1175/mwr-d-11-00061.1 ◽

2012 ◽

Vol 140 (3) ◽

pp. 956-977 ◽

Cited By ~ 24

Author(s):

Nelson L. Seaman ◽

Brian J. Gaudet ◽

David R. Stauffer ◽

Larry Mahrt ◽

Scott J. Richardson ◽

...

Keyword(s):

Prediction Models ◽

Weather Prediction ◽

Weather Research And Forecasting ◽

Stable Boundary Layers ◽

Stable Boundary ◽

Central Pennsylvania ◽

Stable Conditions ◽

Improved Model ◽

Mean Square Errors ◽

Numerical Weather Prediction Models

Abstract Numerical weather prediction models often perform poorly for weakly forced, highly variable winds in nocturnal stable boundary layers (SBLs). When used as input to air-quality and dispersion models, these wind errors can lead to large errors in subsequent plume forecasts. Finer grid resolution and improved model numerics and physics can help reduce these errors. The Advanced Research Weather Research and Forecasting model (ARW-WRF) has higher-order numerics that may improve predictions of finescale winds (scales <~20 km) in nocturnal SBLs. However, better understanding of the physics controlling SBL flow is needed to take optimal advantage of advanced modeling capabilities. To facilitate ARW-WRF evaluations, a small network of instrumented towers was deployed in the ridge-and-valley topography of central Pennsylvania (PA). Time series of local observations and model forecasts on 1.333- and 0.444-km grids were filtered to isolate deterministic lower-frequency wind components. The time-filtered SBL winds have substantially reduced root-mean-square errors and biases, compared to raw data. Subkilometer horizontal and very fine vertical resolutions are found to be important for reducing model speed and direction errors. Nonturbulent fluctuations in unfiltered, very finescale winds, parts of which may be resolvable by ARW-WRF, are shown to generate horizontal meandering in stable weakly forced conditions. These submesoscale motions include gravity waves, primarily horizontal 2D motions, and other complex signatures. Vertical structure and low-level biases of SBL variables are shown to be sensitive to parameter settings defining minimum “background” mixing in very stable conditions in two representative turbulence schemes.

Download Full-text

Bioinformatic-Based Approaches for Disease-Resistance Gene Discovery in Plants

Agronomy ◽

10.3390/agronomy11112259 ◽

2021 ◽

Vol 11 (11) ◽

pp. 2259

Author(s):

Andrea Fernandez-Gutierrez ◽

Juan J. Gutierrez-Gonzalez

Keyword(s):

Disease Resistance ◽

Resistance Genes ◽

Gene Annotation ◽

Pathogen Resistance ◽

Limiting Factors ◽

Disease Resistance Gene ◽

R Genes ◽

Nucleotide Binding Sites ◽

Bioinformatic Tools ◽

Computer Based

Pathogens are among the most limiting factors for crop success and expansion. Thus, finding the underlying genetic cause of pathogen resistance is the main goal for plant geneticists. The activation of a plant’s immune system is mediated by the presence of specific receptors known as disease-resistance genes (R genes). Typical R genes encode functional immune receptors with nucleotide-binding sites (NBS) and leucine-rich repeat (LRR) domains, making the NBS-LRRs the largest family of plant resistance genes. Establishing host resistance is crucial for plant growth and crop yield but also for reducing pesticide use. In this regard, pyramiding R genes is thought to be the most ecologically friendly way to enhance the durability of resistance. To accomplish this, researchers must first identify the related genes, or linked markers, within the genomes. However, the duplicated nature, with the presence of frequent paralogues, and clustered characteristic of NLRs make them difficult to predict with the classic automatic gene annotation pipelines. In the last several years, efforts have been made to develop new methods leading to a proliferation of reports on cloned genes. Herein, we review the bioinformatic tools to assist the discovery of R genes in plants, focusing on well-established pipelines with an important computer-based component.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

Supplementary material to "Limitations of WRF land surface models for simulating land use and land cover change in Sub-Saharan Africa and development of an improved model (CLM-AF v. 1.0)"

10.5194/gmd-2020-193-supplement ◽

2020 ◽

Author(s):

Timothy Glotfelty ◽

Diana Ramírez-Mejía ◽

Jared Bowden ◽

Adrián Ghilardi ◽

J. Jason West

Keyword(s):

Land Use ◽

Land Cover ◽

Land Cover Change ◽

Land Surface ◽

Sub Saharan Africa ◽

Land Surface Models ◽

Surface Models ◽

Supplementary Material ◽

Sub Saharan ◽

Improved Model

Download Full-text

Palaeolatitudinal distribution of the Ediacaran macrobiota

Journal of the Geological Society ◽

10.1144/jgs2021-030 ◽

2021 ◽

pp. jgs2021-030

Author(s):

Catherine E. Boddy ◽

Emily G. Mitchell ◽

Andrew Merdith ◽

Alexander G. Liu

Keyword(s):

Taxonomic Composition ◽

Supplementary Information ◽

Cambrian Explosion ◽

Content Type ◽

Link Type ◽

Environmental Perturbations ◽

Significant Difference ◽

Evolutionary Trajectories ◽

Cambrian Radiation ◽

Supplementary Material

Macrofossils of the late Ediacaran Period (c. 579–539 Ma) document diverse, complex multicellular eukaryotes, including early animals, prior to the Cambrian radiation of metazoan phyla. To investigate the relationships between environmental perturbations, biotic responses and early metazoan evolutionary trajectories, it is vital to distinguish between evolutionary and ecological controls on the global distribution of Ediacaran macrofossils. The contributions of temporal, palaeoenvironmental and lithological factors in shaping the observed variations in assemblage taxonomic composition between Ediacaran macrofossil sites are widely discussed, but the role of palaeogeography remains ambiguous. Here we investigate the influence of palaeolatitude on the spatial distribution of Ediacaran macrobiota through the late Ediacaran Period using two leading palaeogeographical reconstructions. We find that overall generic diversity was distributed across all palaeolatitudes. Among specific groups, the distributions of candidate ‘Bilateral’ and Frondomorph taxa exhibit weakly statistically significant and statistically significant differences between low and high palaeolatitudes within our favoured palaeogeographical reconstruction, respectively, whereas Algal, Tubular, Soft-bodied and Biomineralizing taxa show no significant difference. The recognition of statistically significant palaeolatitudinal differences in the distribution of certain morphogroups highlights the importance of considering palaeolatitudinal influences when interrogating trends in Ediacaran taxon distributions.Supplementary material: Supplementary information, data and code are available at https://doi.org/10.6084/m9.figshare.c.5488945Thematic collection: This article is part of the Advances in the Cambrian Explosion collection available at: https://www.lyellcollection.org/cc/advances-cambrian-explosion

Download Full-text

DeepPurpose: a deep learning library for drug–target interaction prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa1005 ◽

2020 ◽

Author(s):

Kexin Huang ◽

Tianfan Fu ◽

Lucas M Glass ◽

Marinka Zitnik ◽

Cao Xiao ◽

...

Keyword(s):

Deep Learning ◽

Drug Target ◽

Prediction Models ◽

State Of The Art ◽

Supplementary Information ◽

Target Interaction ◽

Interaction Prediction ◽

Computer Scientists ◽

Benchmark Datasets ◽

Biomedical Field

Abstract Summary Accurate prediction of drug–target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. Availability and implementation https://github.com/kexinhuang12345/DeepPurpose. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Reduction of Prediction Errors for the Matrix Converter with an Improved Model Predictive Control

Energies ◽

10.3390/en12153029 ◽

2019 ◽

Vol 12 (15) ◽

pp. 3029 ◽

Cited By ~ 4

Author(s):

Shuang Feng ◽

Chaofan Wei ◽

Jiaxing Lei

Keyword(s):

Model Predictive Control ◽

Predictive Control ◽

Prediction Models ◽

Sampling Period ◽

Continuous Model ◽

Matrix Converter ◽

Prediction Errors ◽

Current Harmonics ◽

The Matrix ◽

Improved Model

In this paper, an improved model predictive control (MPC) is proposed for the matrix converter (MC). First, the conventional MPC which adopts the separately discretized prediction models is discussed. It shows that the conventional MPC ignores the input–output interaction in every sampling period. Consequently, additional prediction errors arise, resulting in more current harmonics. Second, the principle of the improved MPC is presented. With the interaction considered, the integral state-space equation of the whole MC system is constructed and discretized to obtain the precise model. The eigenvalue analysis shows that the proposed prediction model has the same eigenvalues with the continuous model, and thus is more accurate than the conventional one to describe the MC’s behavior in every sampling period. Finally, experimental results under various working conditions prove that the proposed approach can always increase the control accuracy and reduce the harmonic distortions, which in turn requires smaller filter components.

Download Full-text

Generalized Born radii computation using linear models and neural networks

Bioinformatics ◽

10.1093/bioinformatics/btz818 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1757-1764

Author(s):

Saida Saad Mohamed Mahmoud ◽

Gennaro Esposito ◽

Giuseppe Serra ◽

Federico Fogolari

Keyword(s):

Neural Network ◽

Neural Networks ◽

Linear Model ◽

Correlation Coefficient ◽

Linear Models ◽

Reference Method ◽

Supplementary Information ◽

Model Parameters ◽

Generalized Born ◽

Supplementary Material

Abstract Motivation Implicit solvent models play an important role in describing the thermodynamics and the dynamics of biomolecular systems. Key to an efficient use of these models is the computation of generalized Born (GB) radii, which is accomplished by algorithms based on the electrostatics of inhomogeneous dielectric media. The speed and accuracy of such computations are still an issue especially for their intensive use in classical molecular dynamics. Here, we propose an alternative approach that encodes the physics of the phenomena and the chemical structure of the molecules in model parameters which are learned from examples. Results GB radii have been computed using (i) a linear model and (ii) a neural network. The input is the element, the histogram of counts of neighbouring atoms, divided by atom element, within 16 Å. Linear models are ca. 8 times faster than the most widely used reference method and the accuracy is higher with correlation coefficient with the inverse of ‘perfect’ GB radii of 0.94 versus 0.80 of the reference method. Neural networks further improve the accuracy of the predictions with correlation coefficient with ‘perfect’ GB radii of 0.97 and ca. 20% smaller root mean square error. Availability and implementation We provide a C program implementing the computation using the linear model, including the coefficients appropriate for the set of Bondi radii, as Supplementary Material. We also provide a Python implementation of the neural network model with parameter and example files in the Supplementary Material as well. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text