scholarly journals Web-based design and analysis tools for CRISPR base editing

2018 ◽  
Author(s):  
Gue-Ho Hwang ◽  
Jeongbin Park ◽  
Kayeong Lim ◽  
Sunghyun Kim ◽  
Jihyeon Yu ◽  
...  

AbstractBackgroundAs a result of its simplicity and high efficiency, the CRISPR-Cas system has been widely used as a genome editing tool. Recently, CRISPR base editors, which consist of deactivated Cas9 (dCas9) or Cas9 nickase (nCas9) linked with a cytidine or a guanine deaminase, have been developed. Base editing tools will be very useful for gene correction because they can produce highly specific DNA substitutions without the introduction of any donor DNA, but dedicated web-based tools to facilitate the use of such tools have not yet been developed.ResultsWe present two web tools for base editors, named BE-Designer and BE-Analyzer. BE-Designer provides all possible base editor target sequences in a given input DNA sequence with useful information including potential off-target sites. BE-Analyzer, a tool for assessing base editing outcomes from next generation sequencing (NGS) data, provides information about mutations in a table and interactive graphs. Furthermore, because the tool runs client-side, large amounts of targeted deep sequencing data (>100MB) do not need to be uploaded to a server, substantially reducing running time and increasing data security. BE-Designer and BE-Analyzer can be freely accessed at http://www.rgenome.net/bedesigner/ and http://www.rgenome.net/be-analyzer/respectivelyConclusionWe develop two useful web tools to design target sequence (BE-Designer) and to analyze NGS data from experimental results (BE-Analyzer) for CRISPR base editors.

2019 ◽  
Author(s):  
Ayman Yousif ◽  
Nizar Drou ◽  
Jillian Rowe ◽  
Mohammed Khalfan ◽  
Kristin C Gunsalus

AbstractBackgroundAs high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).ResultsNASQAR offers a collection of custom and publicly available open-source web applications that make extensive use of a variety of R packages to provide interactive data analysis and visualization. The platform is publicly accessible at http://nasqar.abudhabi.nyu.edu/. Open-source code is on GitHub at https://github.com/nasqar/NASQAR, and the system is also available as a Docker image at https://hub.docker.com/r/aymanm/nasqarall. NASQAR is a collaboration between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology.ConclusionsNASQAR empowers non-programming experts with a versatile and intuitive toolbox to easily and efficiently explore, analyze, and visualize their Transcriptomics data interactively. Popular tools for a variety of applications are currently available, including Transcriptome Data Preprocessing, RNA-seq Analysis (including Single-cell RNA-seq), Metagenomics, and Gene Enrichment.


2017 ◽  
Author(s):  
Julian Garneau ◽  
Florence Depardieu ◽  
Louis-Charles Fortier ◽  
David Bikard ◽  
Marc Monot

ABSTRACTBacteriophages are the most abundant viruses on earth and display an impressive genetic as well as morphologic diversity. Among those, the most common order of phages is the Caudovirales, whose viral particles packages linear double stranded DNA (dsDNA). In this study we investigated how the information gathered by high throughput sequencing technologies can be used to determine the DNA termini and packaging mechanisms of dsDNA phages. The wet-lab procedures traditionally used for this purpose rely on the identification and cloning of restriction fragment which can be delicate and cumbersome. Here, we developed a theoretical and statistical framework to analyze DNA termini and phage packaging mechanisms using next-generation sequencing data. Our methods, implemented in the PhageTerm software, work with sequencing reads in fastq format and the corresponding assembled phage genome.PhageTerm was validated on a set of phages with well-established packaging mechanisms representative of the termini diversity: 5’cos (lambda), 3’cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of 9Clostridium difficilephages and 6 phages whose sequences where retrieved from the sequence read archive (SRA).A direct graphical interface is available as a Galaxy wrapper version athttps://galaxy.pasteur.frand a standalone version is accessible athttps://sourceforge.net/projects/phageterm/.


2016 ◽  
Author(s):  
Renato Augusto Corrêa dos Santos ◽  
Gustavo Henrique Goldman ◽  
Diego Mauricio Riaño-Pachón

AbstractSummaryploidyNGSis a model-free, open source tool to visualize and explore ploidy levels in a newly sequenced genome, exploiting short read data. We testedploidyNGSusing both simulated and real NGS data of the model yeastSaccharomyces cerevisiae.ploidyNGSallows the identification of the ploidy level of a newly sequenced genome in a visual way.Availability and implementationploidyNGSis available under the GNU General Public License (GPL) athttps://github.com/diriano/ploidyNGS.ploidyNGSis implemented in Python and [email protected]


2021 ◽  
Author(s):  
Ling Li ◽  
Mingming Niu ◽  
Alyssa Erickson ◽  
Jie Luo ◽  
Kincaid Rowbotham ◽  
...  

AbstractIntegration of genomics and proteomics (proteogenomics) offers unprecedented promise for in-depth understanding of human diseases. However, sample mix-up is a pervasive, recurring problem, due to complex sample processing in proteogenomics. Here we present a pipeline for Sample Matching in Proteogenomics (SMAP) for verifying sample identity to ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulation data indicates that SMAP is capable of uniquely match proteomic and genomic samples, when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale proteomics dataset from 288 biological samples generated by the PsychENCODE BrainGVEX project, we identified and corrected 18.8% (54/288) mismatched samples. The correction was further confirmed by ribosome profiling and assay for transposase-accessible chromatin sequencing data from the same set of samples. Thus our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. The source code, manual, and sample data of the SMAP are publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based SMAP can be accessed at https://smap.shinyapps.io/smap/.


2017 ◽  
Vol 1 (6) ◽  
pp. 381-386 ◽  
Author(s):  
Alistair Ward ◽  
Mary A. Karren ◽  
Tonya Di Sera ◽  
Chase Miller ◽  
Matt Velinder ◽  
...  

IntroductionComputational analysis of genome or exome sequences may improve inherited disease diagnosis, but is costly and time-consuming.MethodsWe describe the use of iobio, a web-based tool suite for intuitive, real-time genome diagnostic analyses.ResultsWe used iobio to identify the disease-causing variant in a patient with early infantile epileptic encephalopathy with prior nondiagnostic genetic testing.ConclusionsIobio tools can be used by clinicians to rapidly identify disease-causing variants from genomic patient sequencing data.


2020 ◽  
Author(s):  
José Afonso Guerra-Assunção ◽  
Lucia Conde ◽  
Ismail Moghul ◽  
Amy P. Webster ◽  
Simone Ecker ◽  
...  

AbstractIn recent years, there has been a significant increase in whole genome sequencing data of individual genomes produced by research projects as well as direct to consumer service providers. While many of these sources provide their users with an interpretation of the data, there is a lack of free, open tools for generating reports exploring the data in an easy to understand manner.GenomeChronicler was developed as part of the Personal Genome Project UK (PGP-UK) to address this need. PGP-UK provides genomic, transcriptomic, epigenomic and self-reported phenotypic data under an open-access model with full ethical approval. As a result, the reports generated by GenomeChronicler are intended for research purposes only and include information relating to potentially beneficial and potentially harmful variants, but without clinical curation.GenomeChronicler can be used with data from whole genome or whole exome sequencing, producing a genome report containing information on variant statistics, ancestry and known associated phenotypic traits. Example reports are available from the PGP-UK data page (personalgenomes.org.uk/data).The objective of this method is to leverage existing resources to find known phenotypes associated with the genotypes detected in each sample. The provided trait data is based primarily upon information available in SNPedia, but also collates data from ClinVar, GETevidence and gnomAD to provide additional details on potential health implications, presence of genotype in other PGP participants and population frequency of each genotype.The analysis can be run in a self-contained environment without requiring internet access, making it a good choice for cases where privacy is essential or desired: any third party project can embed GenomeChronicler within their off-line safe-haven environments. GenomeChronicler can be run for one sample at a time, or in parallel making use of the Nextflow workflow manager.The source code is available from GitHub (https://github.com/PGP-UK/GenomeChronicler), container recipes are available for Docker and Singularity, as well as a pre-built container from SingularityHub (https://singularity-hub.org/collections/3664) enabling easy deployment in a variety of settings. Users without access to computational resources to run GenomeChronicler can access the software from the Lifebit CloudOS platform (https://lifebit.ai/cloudos) enabling the production of reports and variant calls from raw sequencing data in a scalable fashion.


2020 ◽  
Vol 9 ◽  
Author(s):  
Cia-Hin Lau ◽  
Chung Tin ◽  
Yousin Suh

The last few years have seen tremendous advances in CRISPR-mediated genome editing. Great efforts have been made to improve the efficiency, specificity, editing window, and targeting scope of CRISPR/Cas9-mediated transgene knock-in and gene correction. In this article, we comprehensively review recent progress in CRISPR-based strategies for targeted transgene knock-in and gene correction in both homology-dependent and homology-independent approaches. We cover homology-directed repair (HDR), synthesis-dependent strand annealing (SDSA), microhomology-mediated end joining (MMEJ), and homology-mediated end joining (HMEJ) pathways for a homology-dependent strategy and alternative DNA repair pathways such as non-homologous end joining (NHEJ), base excision repair (BER), and mismatch repair (MMR) for a homology-independent strategy. We also discuss base editing and prime editing that enable direct conversion of nucleotides in genomic DNA without damaging the DNA or requiring donor DNA. Notably, we illustrate the key mechanisms and design principles for each strategy, providing design guidelines for multiplex, flexible, scarless gene insertion and replacement at high efficiency and specificity. In addition, we highlight next-generation base editors that provide higher editing efficiency, fewer undesired by-products, and broader targeting scope.


Forests ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 804
Author(s):  
Changhong Li ◽  
Yongqi Zheng ◽  
Yu Liu ◽  
Furong Lin ◽  
Ping Huang

Dalbergia hupeana Hance (D. hupeana) is a precious hardwood tree of the genus Dalbergia. It is one of the few species widely distributed within subtropical areas and is important for timber production and forest restoration. At present, there is little published genetic information on D. hupeana. Therefore, we performed a genome survey using next generation sequencing (NGS) and developed a set of novel genomic SSR (gSSR) markers from the assembled data, and assessed the transferability of these markers to other Dalbergia species in Asia. The results of the genome survey show the genome size of D. hupeana to be about 664 Mb and highly heterozygous. The assembly of sequencing data produced 2,431,997 contigs, and the initial assembly of the NGS data alone resulted in contig N50 of 393 kb with a total of 720 Mb. A total of 127,742 perfect SSR markers were found in the assembled contigs. A total of 37 highly polymorphic and easily genotyped gSSR markers were developed in D. hupeana, while the majority of gSSR markers could be successfully transferred to nine other Dalbergia species in Asia. The transferability rate of gSSR markers was highest in D. balansae, which is more closely related to D. hupeana. Seven gSSR markers were able to be amplified in all tested species. In addition, a preliminary assessment of the genetic diversity of three tree species in the Dalbergia genus suggested a high level of genetic diversity within populations distributed in the subtropical area in China. However, the determination of the global status of their genetic variation still requires further and more comprehensive assessment. Our findings will enable further studies on the genetic diversity, phylogenetics, germplasm characterization, and taxonomy of various Dalbergia species.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Martin Lyčka ◽  
Vratislav Peska ◽  
Martin Demko ◽  
Ioannis Spyroglou ◽  
Agata Kilar ◽  
...  

Abstract Background Telomeres, nucleoprotein structures comprising short tandem repeats and delimiting the ends of linear eukaryotic chromosomes, play an important role in the maintenance of genome stability. Therefore, the determination of the length of telomeres is of high importance for many studies. Over the last years, new methods for the analysis of the length of telomeres have been developed, including those based on PCR or analysis of NGS data. Despite that, terminal restriction fragment (TRF) method remains the gold standard to this day. However, this method lacks universally accepted and precise tool capable to analyse and statistically evaluate TRF results. Results To standardize the processing of TRF results, we have developed WALTER, an online toolset allowing rapid, reproducible, and user-friendly analysis including statistical evaluation of the data. Given its web-based nature, it provides an easily accessible way to analyse TRF data without any need to install additional software. Conclusions WALTER represents a major upgrade from currently available tools for the image processing of TRF scans. This toolset enables a rapid, highly reproducible, and user-friendly evaluation of almost any TRF scan including in-house statistical evaluation of the data. WALTER platform together with user manual describing the evaluation of TRF scans in detail and presenting tips and troubleshooting, as well as test data to demo the software are available at https://www.ceitec.eu/chromatin-molecular-complexes-jiri-fajkus/rg51/tab?tabId=125#WALTER and the source code at https://github.com/mlyc93/WALTER.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alvaro Ras-Carmona ◽  
Marta Gomez-Perosanz ◽  
Pedro A. Reche

Abstract Motivation In eukaryotes, proteins targeted for secretion contain a signal peptide, which allows them to proceed through the conventional ER/Golgi-dependent pathway. However, an important number of proteins lacking a signal peptide can be secreted through unconventional routes, including that mediated by exosomes. Currently, no method is available to predict protein secretion via exosomes. Results Here, we first assembled a dataset including the sequences of 2992 proteins secreted by exosomes and 2961 proteins that are not secreted by exosomes. Subsequently, we trained different random forests models on feature vectors derived from the sequences in this dataset. In tenfold cross-validation, the best model was trained on dipeptide composition, reaching an accuracy of 69.88% ± 2.08 and an area under the curve (AUC) of 0.76 ± 0.03. In an independent dataset, this model reached an accuracy of 75.73% and an AUC of 0.840. After these results, we developed ExoPred, a web-based tool that uses random forests to predict protein secretion by exosomes. Conclusion ExoPred is available for free public use at http://imath.med.ucm.es/exopred/. Datasets are available at http://imath.med.ucm.es/exopred/datasets/.


Sign in / Sign up

Export Citation Format

Share Document