PICS2: Next-generation fine mapping via probabilistic identification of causal SNPs

Bioinformatics ◽

10.1093/bioinformatics/btab122 ◽

2021 ◽

Author(s):

Kimberly E Taylor ◽

K Mark Ansel ◽

Alexander Marson ◽

Lindsey A Criswell ◽

Kyle Kai-How Farh

Keyword(s):

Fine Mapping ◽

Web Application ◽

Underlying Disease ◽

Causal Snps ◽

Supplementary Information ◽

Summary Statistics ◽

Next Generation ◽

Mapping Tool ◽

Causal Polymorphism ◽

Biological Studies

Abstract The Probabilistic Identification of Causal SNPs (PICS) algorithm and web application was developed as a fine-mapping tool to determine the likelihood that each single nucleotide polymorphism (SNP) in LD with a reported index SNP is a true causal polymorphism. PICS is notable for its ability to identify candidate causal SNPs within a locus using only the index SNP, which are widely available from published GWAS, whereas other methods require full summary statistics or full genotype data. However, the original PICS web application operates on a single SNP at a time, with slow performance, severely limiting its usability. We have developed a next-generation PICS tool, PICS2, which enables performance of PICS analyses of large batches of index SNPs with much faster performance. Additional updates and extensions include use of LD reference data generated from 1000 Genomes phase 3; annotation of variant consequences; annotation of GTEx eQTL genes and downloadable PICS SNPs from GTEx eQTLs; the option of generating PICS probabilities from experimental summary statistics; and generation of PICS SNPs from all SNPs of the GWAS catalog, automatically updated weekly. These free and easy-to-use resources will enable efficient determination of candidate loci for biological studies to investigate the true causal variants underlying disease processes. Availability PICS2 is available at https://pics2.ucsf.edu. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

E-MAGMA: an eQTL-informed method to identify risk genes using genome-wide association study summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btab115 ◽

2021 ◽

Author(s):

Zachary F Gerring ◽

Angela Mina-Vargas ◽

Eric R Gamazon ◽

Eske M Derks

Keyword(s):

Genome Wide Association ◽

Chromosome 1 ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Tissue Specific ◽

Complex Disorders ◽

Risk Variants ◽

Genome Wide ◽

Causal Genes

Abstract Motivation Genome-wide association studies have successfully identified multiple independent genetic loci that harbour variants associated with human traits and diseases, but the exact causal genes are largely unknown. Common genetic risk variants are enriched in non-protein-coding regions of the genome and often affect gene expression (expression quantitative trait loci, eQTL) in a tissue-specific manner. To address this challenge, we developed a methodological framework, E-MAGMA, which converts genome-wide association summary statistics into gene-level statistics by assigning risk variants to their putative genes based on tissue-specific eQTL information. Results We compared E-MAGMA to three eQTL informed gene-based approaches using simulated phenotype data. Phenotypes were simulated based on eQTL reference data using GCTA for all genes with at least one eQTL at chromosome 1. We performed 10 simulations per gene. The eQTL-h2 (i.e., the proportion of variation explained by the eQTLs) was set at 1%, 2%, and 5%. We found E-MAGMA outperforms other gene-based approaches across a range of simulated parameters (e.g. the number of identified causal genes). When applied to genome-wide association summary statistics for five neuropsychiatric disorders, E-MAGMA identified more putative candidate causal genes compared to other eQTL-based approaches. By integrating tissue-specific eQTL information, these results show E-MAGMA will help to identify novel candidate causal genes from genome-wide association summary statistics and thereby improve the understanding of the biological basis of complex disorders. Availability A tutorial and input files are made available in a github repository: https://github.com/eskederks/eMAGMA-tutorial. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

SurfaceGenie: a web-based application for prioritizing cell-type-specific marker candidates

Bioinformatics ◽

10.1093/bioinformatics/btaa092 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3447-3456 ◽

Cited By ~ 2

Author(s):

Matthew Waas ◽

Shana T Snarrenberg ◽

Jack Littrell ◽

Rachel A Jones Lipinski ◽

Polly A Hansen ◽

...

Keyword(s):

Cell Surface ◽

Specific Surface ◽

Web Application ◽

Rank Order ◽

Surface Proteins ◽

Supplementary Information ◽

Live Cells ◽

Specific Marker ◽

Cell Type ◽

Cell Type Specific

Abstract Motivation Cell-type-specific surface proteins can be exploited as valuable markers for a range of applications including immunophenotyping live cells, targeted drug delivery and in vivo imaging. Despite their utility and relevance, the unique combination of molecules present at the cell surface are not yet described for most cell types. A significant challenge in analyzing ‘omic’ discovery datasets is the selection of candidate markers that are most applicable for downstream applications. Results Here, we developed GenieScore, a prioritization metric that integrates a consensus-based prediction of cell surface localization with user-input data to rank-order candidate cell-type-specific surface markers. In this report, we demonstrate the utility of GenieScore for analyzing human and rodent data from proteomic and transcriptomic experiments in the areas of cancer, stem cell and islet biology. We also demonstrate that permutations of GenieScore, termed IsoGenieScore and OmniGenieScore, can efficiently prioritize co-expressed and intracellular cell-type-specific markers, respectively. Availability and implementation Calculation of GenieScores and lookup of SPC scores is made freely accessible via the SurfaceGenie web application: www.cellsurfer.net/surfacegenie. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

COVID-Align: Accurate online alignment of hCoV-19 genomes using a profile HMM

10.1101/2020.05.25.114884 ◽

2020 ◽

Cited By ~ 2

Author(s):

Frédéric Lemoine ◽

Luc Blassel ◽

Jakub Voznica ◽

Olivier Gascuel

Keyword(s):

Daily Basis ◽

Supplementary Information ◽

Summary Statistics ◽

Evolutionary Novelty ◽

Bioinformatics Analyses ◽

Link Type ◽

Sequencing Quality ◽

User Friendly ◽

Profile Hmm ◽

New Mutations

AbstractMotivationThe first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data.ResultshCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires less than 20mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels).Availabilityhttps://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/[email protected], [email protected] informationSupplementary information is available at Bioinformatics online.

Download Full-text

YeastSpotter: accurate and parameter-free web segmentation for microscopy images of yeast cells

Bioinformatics ◽

10.1093/bioinformatics/btz402 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4525-4527 ◽

Cited By ~ 10

Author(s):

Alex X Lu ◽

Taraneh Zarin ◽

Ian S Hsu ◽

Alan M Moses

Keyword(s):

Image Analysis ◽

Web Application ◽

Single Cells ◽

Yeast Cells ◽

Supplementary Information ◽

Supplementary Data ◽

User Friendly ◽

Microscopy Images

Abstract Summary We introduce YeastSpotter, a web application for the segmentation of yeast microscopy images into single cells. YeastSpotter is user-friendly and generalizable, reducing the computational expertise required for this critical preprocessing step in many image analysis pipelines. Availability and implementation YeastSpotter is available at http://yeastspotter.csb.utoronto.ca/. Code is available at https://github.com/alexxijielu/yeast_segmentation. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

VikNGS: A C ++ Variant Integration Kit for Next Generation Sequencing Association Analysis

Bioinformatics ◽

10.1093/bioinformatics/btz716 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zeynep Baskurt ◽

Scott Mastromatteo ◽

Jiafen Gong ◽

Richard F Wintle ◽

Stephen W Scherer ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Association ◽

Association Analysis ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Combining Data ◽

Generation Sequencing

Abstract Integration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. If differential genotype uncertainty across studies is not accounted for, combining data sets can produce spurious association results. We developed the Variant Integration Kit for NGS (VikNGS), a fast cross-platform software package, to enable aggregation of several data sets for rare and common variant genetic association analysis of quantitative and binary traits with covariate adjustment. VikNGS also includes a graphical user interface, power simulation functionality and data visualization tools. Availability The VikNGS package can be downloaded at http://www.tcag.ca/tools/index.html. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DiscoRhythm: an easy-to-use web application and R package for discovering rhythmicity

Bioinformatics ◽

10.1093/bioinformatics/btz834 ◽

2019 ◽

Cited By ~ 2

Author(s):

Matthew Carlucci ◽

Algimantas Kriščiūnas ◽

Haohan Li ◽

Povilas Gibas ◽

Karolis Koncevičius ◽

...

Keyword(s):

Web Application ◽

Statistical Significance ◽

R Package ◽

Biological Data ◽

Supplementary Information ◽

Statistical Knowledge ◽

Health And Disease ◽

Phase Amplitude ◽

Almost All ◽

User Friendly

Abstract Motivation Biological rhythmicity is fundamental to almost all organisms on Earth and plays a key role in health and disease. Identification of oscillating signals could lead to novel biological insights, yet its investigation is impeded by the extensive computational and statistical knowledge required to perform such analysis. Results To address this issue, we present DiscoRhythm (Discovering Rhythmicity), a user-friendly application for characterizing rhythmicity in temporal biological data. DiscoRhythm is available as a web application or an R/Bioconductor package for estimating phase, amplitude, and statistical significance using four popular approaches to rhythm detection (Cosinor, JTK Cycle, ARSER, and Lomb-Scargle). We optimized these algorithms for speed, improving their execution times up to 30-fold to enable rapid analysis of -omic-scale datasets in real-time. Informative visualizations, interactive modules for quality control, dimensionality reduction, periodicity profiling, and incorporation of experimental replicates make DiscoRhythm a thorough toolkit for analyzing rhythmicity. Availability and Implementation The DiscoRhythm R package is available on Bioconductor (https://bioconductor.org/packages/DiscoRhythm), with source code available on GitHub (https://github.com/matthewcarlucci/DiscoRhythm) under a GPL-3 license. The web application is securely deployed over HTTPS (https://disco.camh.ca) and is freely available for use worldwide. Local instances of the DiscoRhythm web application can be created using the R package or by deploying the publicly available Docker container (https://hub.docker.com/r/mcarlucci/discorhythm). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

COVID-Align: Accurate online alignment of hCoV-19 genomes using a profile HMM

Bioinformatics ◽

10.1093/bioinformatics/btaa871 ◽

2020 ◽

Author(s):

Frédéric Lemoine ◽

Luc Blassel ◽

Jakub Voznica ◽

Olivier Gascuel

Keyword(s):

Daily Basis ◽

Multiple Alignment ◽

Supplementary Information ◽

Summary Statistics ◽

Evolutionary Novelty ◽

Bioinformatics Analyses ◽

Sequencing Quality ◽

User Friendly ◽

Profile Hmm ◽

New Mutations

Abstract Motivation The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. Results hCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires ∼50mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). Availability https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align Supplementary information Supplementary information is available at Bioinformatics online.

Download Full-text

Deep-learning method for data association in particle tracking

Bioinformatics ◽

10.1093/bioinformatics/btaa597 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4935-4941 ◽

Cited By ~ 1

Author(s):

Yao Yao ◽

Ihor Smal ◽

Ilya Grigoriev ◽

Anna Akhmanova ◽

Erik Meijering

Keyword(s):

Deep Learning ◽

Particle Tracking ◽

Short Term Memory ◽

Data Association ◽

Time Lapse ◽

Supplementary Information ◽

Great Promise ◽

Learning Method ◽

Biological Studies ◽

Comprehensive Evaluations

Abstract Motivation Biological studies of dynamic processes in living cells often require accurate particle tracking as a first step toward quantitative analysis. Although many particle tracking methods have been developed for this purpose, they are typically based on prior assumptions about the particle dynamics, and/or they involve careful tuning of various algorithm parameters by the user for each application. This may make existing methods difficult to apply by non-expert users and to a broader range of tracking problems. Recent advances in deep-learning techniques hold great promise in eliminating these disadvantages, as they can learn how to optimally track particles from example data. Results Here, we present a deep-learning-based method for the data association stage of particle tracking. The proposed method uses convolutional neural networks and long short-term memory networks to extract relevant dynamics features and predict the motion of a particle and the cost of linking detected particles from one time point to the next. Comprehensive evaluations on datasets from the particle tracking challenge demonstrate the competitiveness of the proposed deep-learning method compared to the state of the art. Additional tests on real-time-lapse fluorescence microscopy images of various types of intracellular particles show the method performs comparably with human experts. Availability and implementation The software code implementing the proposed method as well as a description of how to obtain the test data used in the presented experiments will be available for non-commercial purposes from https://github.com/yoyohoho0221/pt_linking. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text