scholarly journals Prediction of sgRNA Off-Target Activity in CRISPR/Cas9 Gene Editing Using Graph Convolution Network

Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 608
Author(s):  
Prasoon Kumar Vinodkumar ◽  
Cagri Ozcinar ◽  
Gholamreza Anbarjafari

CRISPR/Cas9 is a powerful genome-editing technology that has been widely applied in targeted gene repair and gene expression regulation. One of the main challenges for the CRISPR/Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far to predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques; however, this is a convoluted process that is difficult to understand and implement for researchers. In this research work, we introduce a novel graph-based approach to predict off-target efficacy of sgRNA in the CRISPR/Cas9 system that is easy to understand and replicate for researchers. This is achieved by creating a graph with sequences as nodes and by using a link prediction method to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences. We used HEK293 and K562 t datasets in our experiments. GCN predicted the off-target gene knockouts (using link prediction) by predicting the links between sgRNA and off-target sequences with an auROC value of 0.987.

2020 ◽  
Author(s):  
Qiaoyue Liu ◽  
Xiang Cheng ◽  
Gan Liu ◽  
Bohao Li ◽  
Xiuqin Liu

Abstract Background CRISPR/Cas9 system, as the third-generation genome editing technology, has been widely applied in target gene repair and gene expression regulation. Selection of appropriate sgRNA can improve the on-target knockout efficacy of CRISPR/Cas9 system with high sensitivity and specificity. However, when CRISPR/Cas9 system is operating, unexpected cleavage may occur at some sites, known as off-target. Presently, a number of prediction methods have been developed to predict the off-target propensity of sgRNA at specific DNA fragment. Most of them use artificial feature extraction operations and machine learning techniques to obtain off-target scores. With the rapid expansion of off-target data and the rapid development of deep learning theory, the existing prediction methods can no longer satisfy the prediction accuracy at the clinical level. Results Here, we propose a prediction method named CnnCrispr to predict the off-target propensity of sgRNA at specific DNA fragments. CnnCrispr automatically trains the sequence features of sgRNA-DNA pairs with GloVe model, and embeds the trained word vector matrix into the deep learning model including biLSTM and CNN with five hidden layers. We conducted performance verification on the data set provided by DeepCrispr, and found that the auROC and auPRC in the "leave-one-sgRNA-out" cross validation could reach 0.957 and 0.429 respectively (the pearson value and spearman value could reach 0.495 and 0.151 respectively under the same settings). Conclusion Our results show that CnnCrispr has better classification and regression performance than the existing states-of-art models. The code of CnnCrispr can be freely downloaded from https://github.com/LQYoLH/CnnCrispr.


2020 ◽  
Author(s):  
Qiaoyue Liu ◽  
Xiang Cheng ◽  
Gan Liu ◽  
Bohao Li ◽  
Xiuqin Liu

Abstract Background CRISPR/Cas9 system, as the third-generation genome editing technology, has been widely applied in target gene repair and gene expression regulation. Selection of appropriate sgRNA can improve the on-target knockout efficacy of CRISPR/Cas9 system with high sensitivity and specificity. However, when CRISPR/Cas9 system is operating, unexpected cleavage may occur at some sites, known as off-target. Presently, a number of prediction methods have been developed to predict the off-target propensity of sgRNA at specific DNA fragment. Most of them use artificial feature extraction operations and machine learning techniques to obtain off-target scores. With the rapid expansion of off-target data and the rapid development of deep learning theory, the existing prediction methods can no longer satisfy the prediction accuracy at the clinical level. Results Here, we propose a prediction method named CnnCrispr to predict the off-target propensity of sgRNA at specific DNA fragments. CnnCrispr automatically trains the sequence features of sgRNA-DNA pairs with GloVe model, and embeds the trained word vector matrix into the deep learning model including biLSTM and CNN with five hidden layers. We conducted performance verification on the data set provided by DeepCrispr, and found that the auROC and auPRC in the "leave-one-sgRNA-out" cross validation could reach 0.957 and 0.429 respectively (the pearson value and spearman value could reach 0.495 and 0.151 respectively under the same settings). Conclusion Our results show that CnnCrispr has better classification and regression performance than the existing states-of-art models. The code of CnnCrispr can be freely downloaded from https://github.com/LQYoLH/CnnCrispr.


2019 ◽  
Author(s):  
Qiaoyue Liu ◽  
Xiang Cheng ◽  
Gan Liu ◽  
Bohao Li ◽  
Xiuqin Liu

Abstract Background CRISPR/Cas9 system, as the third-generation genome editing technology, has been widely applied in target gene repair and gene expression regulation. Selection of appropriate sgRNA can improve the on-target knockout efficacy of CRISPR/Cas9 system with high sensitivity and specificity. However, when CRISPR/Cas9 system is operating, unexpected cleavage may occur at some sites, known as off-target. Presently, a number of prediction methods have been developed to predict the off-target propensity of sgRNA at specific DNA fragment. Most of them use artificial feature extraction operations and machine learning techniques to obtain off-target scores. With the rapid expansion of off-target data and the rapid development of deep learning theory, the existing prediction methods can no longer satisfy the prediction accuracy at the clinical level. Results Here, we propose a prediction method named CnnCrispr to predict the off-target propensity of sgRNA at specific DNA fragments. CnnCrispr automatically trains the sequence features of sgRNA-DNA pairs with GloVe model, and embeds the trained word vector matrix into the deep learning model including biLSTM and CNN with five hidden layers. We conducted performance verification on the data set provided by DeepCrispr, and found that the auROC and auPRC in the "leave-one-sgRNA-out" cross validation could reach 0.957 and 0.429 respectively (the pearson value and spearman value could reach 0.495 and 0.151 respectively under the same settings). Conclusion Our results show that CnnCrispr has better classification and regression performance than the existing states-of-art models.


2020 ◽  
Author(s):  
Mohammad Alarifi ◽  
Somaieh Goudarzvand3 ◽  
Abdulrahman Jabour ◽  
Doreen Foy ◽  
Maryam Zolnoori

BACKGROUND The rate of antidepressant prescriptions is globally increasing. A large portion of patients stop their medications which could lead to many side effects including relapse, and anxiety. OBJECTIVE The aim of this was to develop a drug-continuity prediction model and identify the factors associated with drug-continuity using online patient forums. METHODS We retrieved 982 antidepressant drug reviews from the online patient’s forum AskaPatient.com. We followed the Analytical Framework Method to extract structured data from unstructured data. Using the structured data, we examined the factors associated with antidepressant discontinuity and developed a predictive model using multiple machine learning techniques. RESULTS We tested multiple machine learning techniques which resulted in different performances ranging from accuracy of 65% to 82%. We found that Radom Forest algorithm provides the highest prediction method with 82% Accuracy, 78% Precision, 88.03% Recall, and 84.2% F1-Score. The factors associated with drug discontinuity the most were; withdrawal symptoms, effectiveness-ineffectiveness, perceived-distress-adverse drug reaction, rating, and perceived-distress related to withdrawal symptoms. CONCLUSIONS Although the nature of data available at online forums differ from data collected through surveys, we found that online patients forum can be a valuable source of data for drug-continuity prediction and understanding patients experience. The factors identified through our techniques were consistent with the findings of prior studies that used surveys.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


Plant Methods ◽  
2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Yan Zhang ◽  
Ping Zhou ◽  
Tohir A. Bozorov ◽  
Daoyuan Zhang

Abstract Background Xinjiang wild apple is an important tree of the Tianshan Mountains, and in recent years, it has undergone destruction by many biotic and abiotic stress and human activities. It is necessary to use new technologies to research its genomic function and molecular improvement. The clustered regulatory interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) system has been successfully applied to genetic improvement in many crops, but its editing capability varies depending on the different combinations of the synthetic guide RNA (sgRNA) and Cas9 protein expression devices. Results In this study, we used 2 systems of vectors with paired sgRNAs targeting to MsPDS. As expected, we successfully induced the albino phenotype of calli and buds in both systems. Conclusions We conclude that CRISPR/Cas9 is a powerful system for editing the wild apple genome and expands the range of plants available for gene editing.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Junyi Li ◽  
Huinian Li ◽  
Xiao Ye ◽  
Li Zhang ◽  
Qingzhe Xu ◽  
...  

Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.


Gene Therapy ◽  
2021 ◽  
Author(s):  
Jonathan O’Keeffe Ahern ◽  
Irene Lara-Sáez ◽  
Dezhong Zhou ◽  
Rodolfo Murillas ◽  
Jose Bonafont ◽  
...  

AbstractRecent advances in molecular biology have led to the CRISPR revolution, but the lack of an efficient and safe delivery system into cells and tissues continues to hinder clinical translation of CRISPR approaches. Polymeric vectors offer an attractive alternative to viruses as delivery vectors due to their large packaging capacity and safety profile. In this paper, we have demonstrated the potential use of a highly branched poly(β-amino ester) polymer, HPAE-EB, to enable genomic editing via CRISPRCas9-targeted genomic excision of exon 80 in the COL7A1 gene, through a dual-guide RNA sequence system. The biophysical properties of HPAE-EB were screened in a human embryonic 293 cell line (HEK293), to elucidate optimal conditions for efficient and cytocompatible delivery of a DNA construct encoding Cas9 along with two RNA guides, obtaining 15–20% target genomic excision. When translated to human recessive dystrophic epidermolysis bullosa (RDEB) keratinocytes, transfection efficiency and targeted genomic excision dropped. However, upon delivery of CRISPR–Cas9 as a ribonucleoprotein complex, targeted genomic deletion of exon 80 was increased to over 40%. Our study provides renewed perspective for the further development of polymer delivery systems for application in the gene editing field in general, and specifically for the treatment of RDEB.


Sign in / Sign up

Export Citation Format

Share Document