Web-based design and analysis tools for CRISPR base editing

Mapping Intimacies ◽

10.1101/373944 ◽

2018 ◽

Cited By ~ 2

Author(s):

Gue-Ho Hwang ◽

Jeongbin Park ◽

Kayeong Lim ◽

Sunghyun Kim ◽

Jihyeon Yu ◽

...

Keyword(s):

High Efficiency ◽

Target Sequence ◽

Sequencing Data ◽

Gene Correction ◽

Web Based ◽

Base Editing ◽

Link Type ◽

Web Tools ◽

A Genome ◽

Ngs Data

AbstractBackgroundAs a result of its simplicity and high efficiency, the CRISPR-Cas system has been widely used as a genome editing tool. Recently, CRISPR base editors, which consist of deactivated Cas9 (dCas9) or Cas9 nickase (nCas9) linked with a cytidine or a guanine deaminase, have been developed. Base editing tools will be very useful for gene correction because they can produce highly specific DNA substitutions without the introduction of any donor DNA, but dedicated web-based tools to facilitate the use of such tools have not yet been developed.ResultsWe present two web tools for base editors, named BE-Designer and BE-Analyzer. BE-Designer provides all possible base editor target sequences in a given input DNA sequence with useful information including potential off-target sites. BE-Analyzer, a tool for assessing base editing outcomes from next generation sequencing (NGS) data, provides information about mutations in a table and interactive graphs. Furthermore, because the tool runs client-side, large amounts of targeted deep sequencing data (>100MB) do not need to be uploaded to a server, substantially reducing running time and increasing data security. BE-Designer and BE-Analyzer can be freely accessed at http://www.rgenome.net/bedesigner/ and http://www.rgenome.net/be-analyzer/respectivelyConclusionWe develop two useful web tools to design target sequence (BE-Designer) and to analyze NGS data from experimental results (BE-Analyzer) for CRISPR base editors.

Download Full-text

NASQAR: A web-based platform for high-throughput sequencing data analysis and visualization

10.1101/709980 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ayman Yousif ◽

Nizar Drou ◽

Jillian Rowe ◽

Mohammed Khalfan ◽

Kristin C Gunsalus

Keyword(s):

New York ◽

Data Analysis ◽

Open Source ◽

High Throughput ◽

High Throughput Sequencing ◽

Web Applications ◽

Rna Seq ◽

Sequencing Data ◽

Web Based ◽

Link Type

AbstractBackgroundAs high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).ResultsNASQAR offers a collection of custom and publicly available open-source web applications that make extensive use of a variety of R packages to provide interactive data analysis and visualization. The platform is publicly accessible at http://nasqar.abudhabi.nyu.edu/. Open-source code is on GitHub at https://github.com/nasqar/NASQAR, and the system is also available as a Docker image at https://hub.docker.com/r/aymanm/nasqarall. NASQAR is a collaboration between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology.ConclusionsNASQAR empowers non-programming experts with a versatile and intuitive toolbox to easily and efficiently explore, analyze, and visualize their Transcriptomics data interactively. Popular tools for a variety of applications are currently available, including Transcriptome Data Preprocessing, RNA-seq Analysis (including Single-cell RNA-seq), Metagenomics, and Gene Enrichment.

Download Full-text

PhageTerm: a Fast and User-friendly Software to Determine Bacteriophage Termini and Packaging Mode using randomly fragmented NGS data

10.1101/108100 ◽

2017 ◽

Cited By ~ 2

Author(s):

Julian Garneau ◽

Florence Depardieu ◽

Louis-Charles Fortier ◽

David Bikard ◽

Marc Monot

Keyword(s):

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Link Type ◽

Sequencing Technologies ◽

Statistical Framework ◽

Fastq Format ◽

Viral Particles ◽

User Friendly ◽

Ngs Data

ABSTRACTBacteriophages are the most abundant viruses on earth and display an impressive genetic as well as morphologic diversity. Among those, the most common order of phages is the Caudovirales, whose viral particles packages linear double stranded DNA (dsDNA). In this study we investigated how the information gathered by high throughput sequencing technologies can be used to determine the DNA termini and packaging mechanisms of dsDNA phages. The wet-lab procedures traditionally used for this purpose rely on the identification and cloning of restriction fragment which can be delicate and cumbersome. Here, we developed a theoretical and statistical framework to analyze DNA termini and phage packaging mechanisms using next-generation sequencing data. Our methods, implemented in the PhageTerm software, work with sequencing reads in fastq format and the corresponding assembled phage genome.PhageTerm was validated on a set of phages with well-established packaging mechanisms representative of the termini diversity: 5’cos (lambda), 3’cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of 9Clostridium difficilephages and 6 phages whose sequences where retrieved from the sequence read archive (SRA).A direct graphical interface is available as a Galaxy wrapper version athttps://galaxy.pasteur.frand a standalone version is accessible athttps://sourceforge.net/projects/phageterm/.

Download Full-text

ploidyNGS: Visually exploring ploidy with Next Generation Sequencing data

10.1101/086488 ◽

2016 ◽

Cited By ~ 2

Author(s):

Renato Augusto Corrêa dos Santos ◽

Gustavo Henrique Goldman ◽

Diego Mauricio Riaño-Pachón

Keyword(s):

Next Generation Sequencing Data ◽

Sequencing Data ◽

Ploidy Levels ◽

Short Read ◽

Link Type ◽

Model Free ◽

Ngs Data ◽

Free Open Source ◽

Generation Sequencing ◽

General Public License

AbstractSummaryploidyNGSis a model-free, open source tool to visualize and explore ploidy levels in a newly sequenced genome, exploiting short read data. We testedploidyNGSusing both simulated and real NGS data of the model yeastSaccharomyces cerevisiae.ploidyNGSallows the identification of the ploidy level of a newly sequenced genome in a visual way.Availability and implementationploidyNGSis available under the GNU General Public License (GPL) athttps://github.com/diriano/ploidyNGS.ploidyNGSis implemented in Python and [email protected]

Download Full-text

SMAP: A pipeline for sample matching in proteogenomics

10.1101/2021.09.17.460682 ◽

2021 ◽

Author(s):

Ling Li ◽

Mingming Niu ◽

Alyssa Erickson ◽

Jie Luo ◽

Kincaid Rowbotham ◽

...

Keyword(s):

Large Scale ◽

Ribosome Profiling ◽

Sequencing Data ◽

Protein Coding ◽

Web Based ◽

Link Type ◽

Genomics And Proteomics ◽

Sample Data ◽

Dependent Protein ◽

Coding Variants

AbstractIntegration of genomics and proteomics (proteogenomics) offers unprecedented promise for in-depth understanding of human diseases. However, sample mix-up is a pervasive, recurring problem, due to complex sample processing in proteogenomics. Here we present a pipeline for Sample Matching in Proteogenomics (SMAP) for verifying sample identity to ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulation data indicates that SMAP is capable of uniquely match proteomic and genomic samples, when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale proteomics dataset from 288 biological samples generated by the PsychENCODE BrainGVEX project, we identified and corrected 18.8% (54/288) mismatched samples. The correction was further confirmed by ribosome profiling and assay for transposase-accessible chromatin sequencing data from the same set of samples. Thus our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. The source code, manual, and sample data of the SMAP are publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based SMAP can be accessed at https://smap.shinyapps.io/smap/.

Download Full-text

Rapid clinical diagnostic variant investigation of genomic patient sequencing data with iobio web tools

Journal of Clinical and Translational Science ◽

10.1017/cts.2017.311 ◽

2017 ◽

Vol 1 (6) ◽

pp. 381-386 ◽

Cited By ~ 5

Author(s):

Alistair Ward ◽

Mary A. Karren ◽

Tonya Di Sera ◽

Chase Miller ◽

Matt Velinder ◽

...

Keyword(s):

Genetic Testing ◽

Real Time ◽

Computational Analysis ◽

Disease Diagnosis ◽

Epileptic Encephalopathy ◽

Inherited Disease ◽

Sequencing Data ◽

Web Based ◽

Web Tools ◽

Clinical Diagnostic

IntroductionComputational analysis of genome or exome sequences may improve inherited disease diagnosis, but is costly and time-consuming.MethodsWe describe the use of iobio, a web-based tool suite for intuitive, real-time genome diagnostic analyses.ResultsWe used iobio to identify the disease-causing variant in a patient with early infantile epileptic encephalopathy with prior nondiagnostic genetic testing.ConclusionsIobio tools can be used by clinicians to rapidly identify disease-causing variants from genomic patient sequencing data.

Download Full-text

GenomeChronicler: The Personal Genome Project UK Genomic Report Generator Pipeline

10.1101/2020.01.06.873026 ◽

2020 ◽

Author(s):

José Afonso Guerra-Assunção ◽

Lucia Conde ◽

Ismail Moghul ◽

Amy P. Webster ◽

Simone Ecker ◽

...

Keyword(s):

Service Providers ◽

Genome Project ◽

Phenotypic Traits ◽

Personal Genome ◽

Whole Genome ◽

Sequencing Data ◽

Potential Health ◽

Personal Genome Project ◽

Link Type ◽

A Genome

AbstractIn recent years, there has been a significant increase in whole genome sequencing data of individual genomes produced by research projects as well as direct to consumer service providers. While many of these sources provide their users with an interpretation of the data, there is a lack of free, open tools for generating reports exploring the data in an easy to understand manner.GenomeChronicler was developed as part of the Personal Genome Project UK (PGP-UK) to address this need. PGP-UK provides genomic, transcriptomic, epigenomic and self-reported phenotypic data under an open-access model with full ethical approval. As a result, the reports generated by GenomeChronicler are intended for research purposes only and include information relating to potentially beneficial and potentially harmful variants, but without clinical curation.GenomeChronicler can be used with data from whole genome or whole exome sequencing, producing a genome report containing information on variant statistics, ancestry and known associated phenotypic traits. Example reports are available from the PGP-UK data page (personalgenomes.org.uk/data).The objective of this method is to leverage existing resources to find known phenotypes associated with the genotypes detected in each sample. The provided trait data is based primarily upon information available in SNPedia, but also collates data from ClinVar, GETevidence and gnomAD to provide additional details on potential health implications, presence of genotype in other PGP participants and population frequency of each genotype.The analysis can be run in a self-contained environment without requiring internet access, making it a good choice for cases where privacy is essential or desired: any third party project can embed GenomeChronicler within their off-line safe-haven environments. GenomeChronicler can be run for one sample at a time, or in parallel making use of the Nextflow workflow manager.The source code is available from GitHub (https://github.com/PGP-UK/GenomeChronicler), container recipes are available for Docker and Singularity, as well as a pre-built container from SingularityHub (https://singularity-hub.org/collections/3664) enabling easy deployment in a variety of settings. Users without access to computational resources to run GenomeChronicler can access the software from the Lifebit CloudOS platform (https://lifebit.ai/cloudos) enabling the production of reports and variant calls from raw sequencing data in a scalable fashion.

Download Full-text

CRISPR-based strategies for targeted transgene knock-in and gene correction

Faculty Reviews ◽

10.12703/r/9-20 ◽

2020 ◽

Vol 9 ◽

Author(s):

Cia-Hin Lau ◽

Chung Tin ◽

Yousin Suh

Keyword(s):

High Efficiency ◽

Excision Repair ◽

Direct Conversion ◽

Design Guidelines ◽

Gene Correction ◽

End Joining ◽

Base Editing ◽

Base Excision ◽

Non Homologous End Joining ◽

Strand Annealing

The last few years have seen tremendous advances in CRISPR-mediated genome editing. Great efforts have been made to improve the efficiency, specificity, editing window, and targeting scope of CRISPR/Cas9-mediated transgene knock-in and gene correction. In this article, we comprehensively review recent progress in CRISPR-based strategies for targeted transgene knock-in and gene correction in both homology-dependent and homology-independent approaches. We cover homology-directed repair (HDR), synthesis-dependent strand annealing (SDSA), microhomology-mediated end joining (MMEJ), and homology-mediated end joining (HMEJ) pathways for a homology-dependent strategy and alternative DNA repair pathways such as non-homologous end joining (NHEJ), base excision repair (BER), and mismatch repair (MMR) for a homology-independent strategy. We also discuss base editing and prime editing that enable direct conversion of nucleotides in genomic DNA without damaging the DNA or requiring donor DNA. Notably, we illustrate the key mechanisms and design principles for each strategy, providing design guidelines for multiplex, flexible, scarless gene insertion and replacement at high efficiency and specificity. In addition, we highlight next-generation base editors that provide higher editing efficiency, fewer undesired by-products, and broader targeting scope.

Download Full-text

Development of Genomic SSR for the Subtropical Hardwood Tree Dalbergia hupeana and Assessment of Their Transferability to Other Related Species

Forests ◽

10.3390/f12060804 ◽

2021 ◽

Vol 12 (6) ◽

pp. 804

Author(s):

Changhong Li ◽

Yongqi Zheng ◽

Yu Liu ◽

Furong Lin ◽

Ping Huang

Keyword(s):

Genetic Diversity ◽

Forest Restoration ◽

Sequencing Data ◽

Genome Survey ◽

A Genome ◽

Genomic Ssr ◽

Global Status ◽

Hardwood Tree ◽

High Level ◽

Ngs Data

Dalbergia hupeana Hance (D. hupeana) is a precious hardwood tree of the genus Dalbergia. It is one of the few species widely distributed within subtropical areas and is important for timber production and forest restoration. At present, there is little published genetic information on D. hupeana. Therefore, we performed a genome survey using next generation sequencing (NGS) and developed a set of novel genomic SSR (gSSR) markers from the assembled data, and assessed the transferability of these markers to other Dalbergia species in Asia. The results of the genome survey show the genome size of D. hupeana to be about 664 Mb and highly heterozygous. The assembly of sequencing data produced 2,431,997 contigs, and the initial assembly of the NGS data alone resulted in contig N50 of 393 kb with a total of 720 Mb. A total of 127,742 perfect SSR markers were found in the assembled contigs. A total of 37 highly polymorphic and easily genotyped gSSR markers were developed in D. hupeana, while the majority of gSSR markers could be successfully transferred to nine other Dalbergia species in Asia. The transferability rate of gSSR markers was highest in D. balansae, which is more closely related to D. hupeana. Seven gSSR markers were able to be amplified in all tested species. In addition, a preliminary assessment of the genetic diversity of three tree species in the Dalbergia genus suggested a high level of genetic diversity within populations distributed in the subtropical area in China. However, the determination of the global status of their genetic variation still requires further and more comprehensive assessment. Our findings will enable further studies on the genetic diversity, phylogenetics, germplasm characterization, and taxonomy of various Dalbergia species.

Download Full-text

WALTER: an easy way to online evaluate telomere lengths from terminal restriction fragment analysis

BMC Bioinformatics ◽

10.1186/s12859-021-04064-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Martin Lyčka ◽

Vratislav Peska ◽

Martin Demko ◽

Ioannis Spyroglou ◽

Agata Kilar ◽

...

Keyword(s):

Restriction Fragment ◽

Genome Stability ◽

Tandem Repeats ◽

Statistical Evaluation ◽

Molecular Complexes ◽

Web Based ◽

Link Type ◽

Precise Tool ◽

User Friendly ◽

Ngs Data

Abstract Background Telomeres, nucleoprotein structures comprising short tandem repeats and delimiting the ends of linear eukaryotic chromosomes, play an important role in the maintenance of genome stability. Therefore, the determination of the length of telomeres is of high importance for many studies. Over the last years, new methods for the analysis of the length of telomeres have been developed, including those based on PCR or analysis of NGS data. Despite that, terminal restriction fragment (TRF) method remains the gold standard to this day. However, this method lacks universally accepted and precise tool capable to analyse and statistically evaluate TRF results. Results To standardize the processing of TRF results, we have developed WALTER, an online toolset allowing rapid, reproducible, and user-friendly analysis including statistical evaluation of the data. Given its web-based nature, it provides an easily accessible way to analyse TRF data without any need to install additional software. Conclusions WALTER represents a major upgrade from currently available tools for the image processing of TRF scans. This toolset enables a rapid, highly reproducible, and user-friendly evaluation of almost any TRF scan including in-house statistical evaluation of the data. WALTER platform together with user manual describing the evaluation of TRF scans in detail and presenting tips and troubleshooting, as well as test data to demo the software are available at https://www.ceitec.eu/chromatin-molecular-complexes-jiri-fajkus/rg51/tab?tabId=125#WALTER and the source code at https://github.com/mlyc93/WALTER.

Download Full-text

Prediction of unconventional protein secretion by exosomes

BMC Bioinformatics ◽

10.1186/s12859-021-04219-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alvaro Ras-Carmona ◽

Marta Gomez-Perosanz ◽

Pedro A. Reche

Keyword(s):

Protein Secretion ◽

Random Forests ◽

Signal Peptide ◽

Area Under The Curve ◽

Dipeptide Composition ◽

Web Based ◽

Independent Dataset ◽

Link Type ◽

Tenfold Cross Validation ◽

Dependent Pathway

Abstract Motivation In eukaryotes, proteins targeted for secretion contain a signal peptide, which allows them to proceed through the conventional ER/Golgi-dependent pathway. However, an important number of proteins lacking a signal peptide can be secreted through unconventional routes, including that mediated by exosomes. Currently, no method is available to predict protein secretion via exosomes. Results Here, we first assembled a dataset including the sequences of 2992 proteins secreted by exosomes and 2961 proteins that are not secreted by exosomes. Subsequently, we trained different random forests models on feature vectors derived from the sequences in this dataset. In tenfold cross-validation, the best model was trained on dipeptide composition, reaching an accuracy of 69.88% ± 2.08 and an area under the curve (AUC) of 0.76 ± 0.03. In an independent dataset, this model reached an accuracy of 75.73% and an AUC of 0.840. After these results, we developed ExoPred, a web-based tool that uses random forests to predict protein secretion by exosomes. Conclusion ExoPred is available for free public use at http://imath.med.ucm.es/exopred/. Datasets are available at http://imath.med.ucm.es/exopred/datasets/.

Download Full-text