CRISPR-GNL: an improved model for predicting CRISPR activity by machine learning and featurization
ABSTRACTMotivationThe CRISPR/Cas9 system has been broadly used in genetic engineering. However, risks of potential off-targets and the variability of on-target activity among different targets are two limiting factors. Several bioinformatic tools have been developed for CRISPR on-target activity and off-target prediction. However, the general application of the current prediction models is hampered by the great variation among different algorithms.ResultsIn this study, we thoroughly re-analyzed 13 published datasets with eight regression models. We proved that the current model gave very low cross-dataset and cross-species prediction outcome. To overcome these limitations, we have developed an improved model (a generalization score, GNL) based on normalized gene editing activity from 8,101 gRNAs and 2,488 features using Bayesian Ridge Regression model. Our results demonstrated that the GNL model is a better general algorithm for CRISPR on-target activity predictionAvailability and implementationThe prediction scorer is available on GitHub (https://github.com/TerminatorJ/GNL_Scorer).ContactJ.W. ([email protected]) or Y.L. ([email protected])Supplementary InformationSupplementary data are available at Bioinformatics online.