scholarly journals GUIDES: sgRNA design for loss-of-function screens

2017 ◽  
Author(s):  
Joshua A. Meier ◽  
Feng Zhang ◽  
Neville E. Sanjana

GUIDES (Graphical User Interface for DNA Editing Screens) is a web-based tool for the design of custom, large-scale CRISPR libraries for loss-of-function screens in human and mouse. GUIDES combines multi-tissue RNA-sequencing data to target expressed exons, protein annotation to target functional domains, sophisticated on-target and off-target guide RNA scoring and other optimizations to create CRISPR libraries directly from a list of genes without requiring any programming expertise.

2021 ◽  
Author(s):  
Haicang Zhang ◽  
Michelle S. Xu ◽  
Wendy K. Chung ◽  
Yufeng Shen

AbstractAccurate prediction of damaging missense variants is critically important for interpretating genome sequence. While many methods have been developed, their performance has been limited. Recent progress in machine learning and availability of large-scale population genomic sequencing data provide new opportunities to significantly improve computational predictions. Here we describe gMVP, a new method based on graph attention neural networks. Its main component is a graph with nodes capturing predictive features of amino acids and edges weighted by coevolution strength, which enables effective pooling of information from local protein sequence context and functionally correlated distal positions. Evaluated by deep mutational scan data, gMVP outperforms published methods in identifying damaging variants in TP53, PTEN, BRCA1, and MSH2. Additionally, it achieves the best separation of de novo missense variants in neurodevelopmental disorder cases from the ones in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.


2019 ◽  
Author(s):  
Emanuel Gonçalves ◽  
Mark Thomas ◽  
Fiona M Behan ◽  
Gabriele Picco ◽  
Clare Pacini ◽  
...  

AbstractCRISPR guide-RNA libraries have been iteratively optimised to provide increasingly efficient reagents, although their large size is a barrier for many applications. We designed an optimised minimal genome-wide human CRISPR-Cas9 library (MinLibCas9), by mining existing large-scale gene loss-of-function datasets, resulting in a greater than 42% reduction in size compared to other libraries while preserving assay sensitivity and specificity. MinLibCas9 increases the dynamic range of CRISPR-Cas9 loss-of-function screens and extends their application to complex models and assays.


2019 ◽  
Author(s):  
Joseph Park ◽  
Nathan Katz ◽  
Xinyuan Zhang ◽  
Anastasia M Lucas ◽  
Anurag Verma ◽  
...  

AbstractBackgroundBy coupling large-scale DNA sequencing with electronic health records (EHR), “genome-first” approaches can enhance our understanding of the contribution of rare genetic variants to disease. Aggregating rare, loss-of-function variants in a candidate gene into a “gene burden” to test for association with EHR phenotypes can identify both known and novel clinical implications for the gene in human disease. However, this methodology has not yet been applied on both an exome-wide and phenome-wide scale, and the clinical ontologies of rare loss-of-function variants in many genes have yet to be described.MethodsWe leveraged whole exome sequencing (WES) data in participants (N=11,451) in the Penn Medicine Biobank (PMBB) to address on an exome-wide scale the association of a burden of rare loss-of-function variants in each gene with diverse EHR phenotypes using a phenome-wide association study (PheWAS) approach. For discovery, we collapsed rare (minor allele frequency (MAF) ≤ 0.1%) predicted loss-of-function (pLOF) variants (i.e. frameshift insertions/deletions, gain/loss of stop codon, or splice site disruption) per gene to perform a gene burden PheWAS. Subsequent evaluation of the significant gene burden associations was done by collapsing rare (MAF ≤ 0.1%) missense variants with Rare Exonic Variant Ensemble Learner (REVEL) scores ≥ 0.5 into corresponding yet distinct gene burdens, as well as interrogation of individual low-frequency to common (MAF > 0.1%) pLOF variants and missense variants with REVEL≥ 0.5. We replicated our findings using the UK Biobank’s (UKBB) whole exome sequence dataset (N=49,960).ResultsFrom the pLOF-based discovery phase, we identified 106 gene burdens with phenotype associations at p<10-6 from our exome-by-phenome-wide association studies. Positive-control associations included TTN (cardiomyopathy, p=7.83E-13), MYBPC3 (hypertrophic cardiomyopathy, p=3.48E-15), CFTR (cystic fibrosis, p=1.05E-15), CYP2D6 (adverse effects due to opiates/narcotics, p=1.50E-09), and BRCA2 (breast cancer, p=1.36E-07). Of the 106 genes, 12 gene-phenotype relationships were also detected by REVEL-informed missense-based gene burdens and 19 by single-variant analyses, demonstrating the robustness of these gene-phenotype relationships. Three genes showed evidence of association using both additional methods (BRCA1, CFTR, TGM6), leading to a total of 28 robust gene-phenotype associations within PMBB. Furthermore, replication studies in UKBB validated 30 of 106 gene burden associations, of which 12 demonstrated robustness in PMBB.ConclusionOur study presents 12 exome-by-phenome-wide robust gene-phenotype associations, which include three proof-of-concept associations and nine novel findings. We show the value of aggregating rare pLOF variants into gene burdens on an exome-wide scale for unbiased association with EHR phenotypes to identify novel clinical ontologies of human genes. Furthermore, we show the significance of evaluating gene burden associations through complementary, yet non-overlapping genetic association studies from the same dataset. Our results suggest that this approach applied to even larger cohorts of individuals with WES or whole-genome sequencing data linked to EHR phenotype data will yield many new insights into the relationship of genetic variation and disease phenotypes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Emanuel Gonçalves ◽  
Mark Thomas ◽  
Fiona M. Behan ◽  
Gabriele Picco ◽  
Clare Pacini ◽  
...  

AbstractCRISPR guide RNA libraries have been iteratively improved to provide increasingly efficient reagents, although their large size is a barrier for many applications. We design an optimised minimal genome-wide human CRISPR-Cas9 library (MinLibCas9) by mining existing large-scale gene loss-of-function datasets, resulting in a greater than 42% reduction in size compared to other CRISPR-Cas9 libraries while preserving assay sensitivity and specificity. MinLibCas9 provides backward compatibility with existing datasets, increases the dynamic range of CRISPR-Cas9 screens and extends their application to complex models and assays.


2019 ◽  
Author(s):  
Adrienne Hoarfrost ◽  
Nick Brown ◽  
C. Titus Brown ◽  
Carol Arnosti

Sequencing data resources have increased exponentially in recent years, as has interest in large-scale meta-analyses of integrated next-generation sequencing datasets. However, curation of integrated datasets that match a user’s particular research priorities is currently a time-intensive and imprecise task. MetaSeek is a sequencing data discovery tool that enables users to flexibly search and filter on any metadata field to quickly find the sequencing datasets that meet their needs. MetaSeek automatically scrapes metadata from all publicly available datasets in the Sequence Read Archive, cleans and parses messy, user-provided metadata into a structured, standard-compliant database, and predicts missing fields where possible. MetaSeek provides a web-based graphical user interface and interactive visualization dashboard, as well as a programmatic API to rapidly search, filter, visualize, save, share, and download matching sequencing metadata. The MetaSeek online interface is available at https://www.metaseek.cloud/. The MetaSeek database can also be accessed via API to programmatically search, filter, and download all metadata. MetaSeek source code, metadata scrapers, and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek/. Additional guides, tutorials, and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek, and on the MetaSeek website, https://www.metaseek.cloud/. MetaSeek is distributed under an MIT license.


2021 ◽  
Author(s):  
Ling Li ◽  
Mingming Niu ◽  
Alyssa Erickson ◽  
Jie Luo ◽  
Kincaid Rowbotham ◽  
...  

AbstractIntegration of genomics and proteomics (proteogenomics) offers unprecedented promise for in-depth understanding of human diseases. However, sample mix-up is a pervasive, recurring problem, due to complex sample processing in proteogenomics. Here we present a pipeline for Sample Matching in Proteogenomics (SMAP) for verifying sample identity to ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulation data indicates that SMAP is capable of uniquely match proteomic and genomic samples, when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale proteomics dataset from 288 biological samples generated by the PsychENCODE BrainGVEX project, we identified and corrected 18.8% (54/288) mismatched samples. The correction was further confirmed by ribosome profiling and assay for transposase-accessible chromatin sequencing data from the same set of samples. Thus our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. The source code, manual, and sample data of the SMAP are publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based SMAP can be accessed at https://smap.shinyapps.io/smap/.


Author(s):  
Francisco Barcelo-Arroyo ◽  
Israel Martin-Escalona

Air pollution is an important environmental issue that has a direct effect on human health and ecological balance. Factories, power plants, vehicles, windblown dust and wildfires are some of the contributors to pollution. Reasonable simulation tools exist for evaluating large scale sensor networks; however, they fail to capture significant details of node operation or practical aspects of wireless communication. Real life testbeds capture the realism and bring out important aspects for further research. In this paper, we present an implementation of a wireless sensor network testbed for automatic and real-time monitoring of environmental pollution for the protection of public spaces. The paper describes the physical setup, the sensor node hardware and software architecture for “anytime, anywhere” monitoring and management of pollution data through a single, Web-based graphical user interface. The paper presents practical issues in the integration of sensors, actual power consumption rates and develops a practical hierarchical routing methodology.


2019 ◽  
Vol 35 (22) ◽  
pp. 4857-4859 ◽  
Author(s):  
Adrienne Hoarfrost ◽  
Nick Brown ◽  
C Titus Brown ◽  
Carol Arnosti

Abstract Summary Sequencing data resources have increased exponentially in recent years, as has interest in large-scale meta-analyses of integrated next-generation sequencing datasets. However, curation of integrated datasets that match a user’s particular research priorities is currently a time-intensive and imprecise task. MetaSeek is a sequencing data discovery tool that enables users to flexibly search and filter on any metadata field to quickly find the sequencing datasets that meet their needs. MetaSeek automatically scrapes metadata from all publicly available datasets in the Sequence Read Archive, cleans and parses messy, user-provided metadata into a structured, standard-compliant database and predicts missing fields where possible. MetaSeek provides a web-based graphical user interface and interactive visualization dashboard, as well as a programmatic API to rapidly search, filter, visualize, save, share and download matching sequencing metadata. Availability and implementation The MetaSeek online interface is available at https://www.metaseek.cloud/. The MetaSeek database can also be accessed via API to programmatically search, filter and download all metadata. MetaSeek source code, metadata scrapers and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek/.


2020 ◽  
Vol 49 (D1) ◽  
pp. D792-D802
Author(s):  
Alise J Ponsero ◽  
Matthew Bomhoff ◽  
Kai Blumberg ◽  
Ken Youens-Clark ◽  
Nina M Herz ◽  
...  

Abstract In recent years, large-scale oceanic sequencing efforts have provided a deeper understanding of marine microbial communities and their dynamics. These research endeavors require the acquisition of complex and varied datasets through large, interdisciplinary and collaborative efforts. However, no unifying framework currently exists for the marine science community to integrate sequencing data with physical, geological, and geochemical datasets. Planet Microbe is a web-based platform that enables data discovery from curated historical and on-going oceanographic sequencing efforts. In Planet Microbe, each ‘omics sample is linked with other biological and physiochemical measurements collected for the same water samples or during the same sample collection event, to provide a broader environmental context. This work highlights the need for curated aggregation efforts that can enable new insights into high-quality metagenomic datasets. Planet Microbe is freely accessible from https://www.planetmicrobe.org/.


2019 ◽  
Author(s):  
Adrienne Hoarfrost ◽  
Nick Brown ◽  
C. Titus Brown ◽  
Carol Arnosti

Sequencing data resources have increased exponentially in recent years, as has interest in large-scale meta-analyses of integrated next-generation sequencing datasets. However, curation of integrated datasets that match a user’s particular research priorities is currently a time-intensive and imprecise task. MetaSeek is a sequencing data discovery tool that enables users to flexibly search and filter on any metadata field to quickly find the sequencing datasets that meet their needs. MetaSeek automatically scrapes metadata from all publicly available datasets in the Sequence Read Archive, cleans and parses messy, user-provided metadata into a structured, standard-compliant database, and predicts missing fields where possible. MetaSeek provides a web-based graphical user interface and interactive visualization dashboard, as well as a programmatic API to rapidly search, filter, visualize, save, share, and download matching sequencing metadata. The MetaSeek online interface is available at https://www.metaseek.cloud/. The MetaSeek database can also be accessed via API to programmatically search, filter, and download all metadata. MetaSeek source code, metadata scrapers, and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek/. Additional guides, tutorials, and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek, and on the MetaSeek website, https://www.metaseek.cloud/. MetaSeek is distributed under an MIT license.


Sign in / Sign up

Export Citation Format

Share Document