Phandango: an interactive viewer for bacterial population genomics

Mapping Intimacies ◽

10.1101/119545 ◽

2017 ◽

Cited By ~ 13

Author(s):

James Hadfield ◽

Nicholas J. Croucher ◽

Richard J Goater ◽

Khalil Abudahab ◽

David M Aanensen ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Bacterial Population ◽

Population Genomics ◽

Genomic Analysis ◽

Evolutionary Analysis ◽

Base Pairs ◽

Web Browser ◽

Link Type ◽

Scale Population

ABSTRACTSummaryFully exploiting the wealth of data in current bacterial population genomics datasets requires synthesising and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.AvailabilityPhandango is a web application freely available for use at https://jameshadfield.github.io/phandango and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at https://github.com/jameshadfield/[email protected], [email protected]

Download Full-text

PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens

10.1101/147462 ◽

2017 ◽

Author(s):

Philipp N. Spahn ◽

Tyler Bath ◽

Ryan J. Weiss ◽

Jihoon Kim ◽

Jeffrey D. Esko ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Sequencing Data ◽

Bioinformatic Tools ◽

Link Type ◽

Screening Experiments ◽

Independent Analysis ◽

Wide Range ◽

Set Up ◽

Sequence Quality

AbstractBackgroundLarge-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise.ResultsTo make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots.ConclusionsPinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions, documentation and test datasets. The source code is available at https://github.com/LewisLabUCSD/PinAPL-Py

Download Full-text

GENVISAGE: Rapid Identification of Discriminative and Explainable Feature Pairs for Genomic Analysis

10.1101/2020.02.05.935411 ◽

2020 ◽

Author(s):

Silu Huang ◽

Charles Blatti ◽

Saurabh Sinha ◽

Aditya Parameswaran

Keyword(s):

Large Scale ◽

Genomic Analysis ◽

Rapid Identification ◽

Biological Data ◽

Data Sets ◽

Chemotherapy Drugs ◽

Link Type ◽

Discriminative Feature ◽

Exploration Tool ◽

High Dimensional Datasets

AbstractMotivationA common but critical task in genomic data analysis is finding features that separate and thereby help explain differences between two classes of biological objects, e.g., genes that explain the differences between healthy and diseased patients. As lower-cost, high-throughput experimental methods greatly increase the number of samples that are assayed as objects for analysis, computational methods are needed to quickly provide insights into high-dimensional datasets with tens of thousands of objects and features.ResultsWe develop an interactive exploration tool called Genvisage that rapidly discovers the most discriminative feature pairs that best separate two classes in a dataset, and displays the corresponding visualizations. Since quickly finding top feature pairs is computationally challenging, especially when the numbers of objects and features are large, we propose a suite of optimizations to make Genvisage more responsive and demonstrate that our optimizations lead to a 400X speedup over competitive baselines for multiple biological data sets. With this speedup, Genvisage enables the exploration of more large-scale datasets and alternate hypotheses in an interactive and interpretable fashion. We apply Genvisage to uncover pairs of genes whose transcriptomic responses significantly discriminate treatments of several chemotherapy drugs.AvailabilityFree webserver at http://genvisage.knoweng.org:443/ with source code at https://github.com/KnowEnG/Genvisage

Download Full-text

Genome-wide identification of genes regulating DNA methylation using genetic anchors for causal inference

10.1101/823807 ◽

2019 ◽

Cited By ~ 1

Author(s):

Paul J. Hop ◽

René Luijk ◽

Lucia Daxinger ◽

Maarten van Iterson ◽

Koen F. Dekkers ◽

...

Keyword(s):

Dna Methylation ◽

Large Scale ◽

Population Genomics ◽

Epigenetic Modification ◽

Binding Activity ◽

Dna Binding Activity ◽

Genome Wide ◽

Scale Population ◽

Genes Encoding ◽

Methylation Patterns

SUMMARYDNA methylation is a key epigenetic modification in human development and disease, yet there is limited understanding of its highly coordinated regulation. Here, we identified 818 genes that influence DNA methylation patterns in blood using large-scale population genomics data. By employing genetic instruments as causal anchors, we identified directed associations between gene expression and distant DNA methylation levels, whilst ensuring specificity of the associations by correcting for linkage disequilibrium and pleiotropy among neighboring genes. We found that DNA methylation patterns are commonly shaped by transcription factors that consistently increase or decrease DNA methylation levels. However, we also observed genes encoding proteins without DNA binding activity with widespread effects on DNA methylation (e.g. NFKBIE, CDCA7(L) and NLRC5) and we suggest plausible mechanisms underlying these findings. Many of the reported genes were unknown to influence DNA methylation, resulting in a comprehensive resource providing insights in the principles underlying epigenetic regulation.

Download Full-text

EpiMetal: an open-source graphical web browser tool for easy statistical analyses in epidemiology and metabolomics

International Journal of Epidemiology ◽

10.1093/ije/dyz244 ◽

2020 ◽

Vol 49 (4) ◽

pp. 1075-1081

Author(s):

Jussi Ekholm ◽

Pauli Ohukainen ◽

Antti J Kangas ◽

Johannes Kettunen ◽

Qin Wang ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Source Code ◽

Statistical Analyses ◽

Data Driven ◽

Graphical Interface ◽

Self Organizing Maps ◽

Web Browser ◽

Pilot Data ◽

Molecular Measures

Abstract Motivation An intuitive graphical interface that allows statistical analyses and visualizations of extensive data without any knowledge of dedicated statistical software or programming. Implementation EpiMetal is a single-page web application written in JavaScript, to be used via a modern desktop web browser. General features Standard epidemiological analyses and self-organizing maps for data-driven metabolic profiling are included. Multiple extensive datasets with an arbitrary number of continuous and category variables can be integrated with the software. Any snapshot of the analyses can be saved and shared with others via a www-link. We demonstrate the usage of EpiMetal using pilot data with over 500 quantitative molecular measures for each sample as well as in two large-scale epidemiological cohorts (N >10 000). Availability The software usage exemplar and the pilot data are open access online at [http://EpiMetal.computationalmedicine.fi]. MIT licensed source code is available at the Github repository at [https://github.com/amergin/epimetal].

Download Full-text

Return of Results: Towards a Lexicon?

The Journal of Law Medicine & Ethics ◽

10.1111/j.1748-720x.2011.00624.x ◽

2011 ◽

Vol 39 (4) ◽

pp. 577-582 ◽

Cited By ~ 19

Author(s):

Bartha Maria Knoppers ◽

Amy Dam

Keyword(s):

Clinical Research ◽

Large Scale ◽

Population Genomics ◽

Return Of Results ◽

The Public ◽

International Policies ◽

Gene Environment ◽

Scale Population ◽

National Norms ◽

Potential Use

The last few years have witnessed the growth of large-scale, population genomics biobanks, which serve as longitudinal, gene-environment databases for future yet unspecified research. An international consortium, the Public Population Project in Genomics (P3G), builds harmonization tools for such biobanks and has catalogued numerous studies — at least 139 with over 10,000 banked participants and 34 with over 100,000. As their potential use for translational, clinical research draws near, it is opportune to clarify the duties of such biobanks to communicate results to participants. To identify the potential obligations, some demystification of the terminology surrounding the return of results as found in international and national norms on biobanking generally is essential. On the whole, our proposed lexicon is based on a study of norms as found in national and international policies but excludes debates found in the literature.

Download Full-text

Genome-wide identification of directed gene networks using large-scale population genomics data

Nature Communications ◽

10.1038/s41467-018-05452-6 ◽

2018 ◽

Vol 9 (1) ◽

Cited By ~ 9

Author(s):

René Luijk ◽

◽

Koen F. Dekkers ◽

Maarten van Iterson ◽

Wibowo Arindrarto ◽

...

Keyword(s):

Gene Networks ◽

Large Scale ◽

Population Genomics ◽

Genome Wide ◽

Scale Population

Download Full-text

Pavian: Interactive analysis of metagenomics data for microbiomics and pathogen identification

10.1101/084715 ◽

2016 ◽

Cited By ~ 25

Author(s):

Florian P. Breitwieser ◽

Steven L. Salzberg

Keyword(s):

Web Application ◽

Disease Diagnosis ◽

Supplementary Information ◽

Special Focus ◽

Web Browser ◽

R Language ◽

Interactive Analysis ◽

Link Type ◽

Metagenomics Data ◽

Flow Diagrams

AbstractSummaryPavian is a web application for exploring metagenomics classification results, with a special focus on infectious disease diagnosis. Pinpointing pathogens in metagenomics classification results is often complicated by host and laboratory contaminants as well as many non-pathogenic microbiota. With Pavian, researchers can analyze, display and transform results from the Kraken and Centrifuge classifiers using interactive tables, heatmaps and flow diagrams. Pavian also provides an alignment viewer for validation of matches to a particular genome.Availability and implementationPavian is implemented in the R language and based on the Shiny framework. It can be hosted on Windows, Mac OS X and Linux systems, and used with any contemporary web browser. It is freely available under a GPL-3 license from http://github.com/fbreitwieser/pavian. Furthermore a Docker image is provided at https://hub.docker.com/r/florianbw/[email protected] informationSupplementary data is available at Bioinformatics online.

Download Full-text

PANINI: Pangenome Neighbor Identification for Bacterial Populations

10.1101/174409 ◽

2017 ◽

Cited By ~ 2

Author(s):

Khalil Abudahab ◽

Joaquín M. Prada ◽

Zhirong Yang ◽

Stephen D. Bentley ◽

Nicholas J. Croucher ◽

...

Keyword(s):

Machine Learning ◽

Core Genome ◽

Population Genomics ◽

Genomic Analysis ◽

Diverse Populations ◽

Data Set ◽

Bacterial Populations ◽

Accessory Genome ◽

The Core ◽

Link Type

ABSTRACTThe standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, in the current era of population genomics, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here we introduce PANINI, a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding. PANINI is browser-based and integrates with the Microreact platform for rapid online visualisation and exploration of both core and accessory genome evolutionary signals together with relevant epidemiological, geographic, temporal and other metadata. Several case studies with single-and multi-clone pneumococcal populations are presented to demonstrate ability to identify biologically important signals from gene content data. PANINI is available at http://panini.wgsa.net/ and code at http://gitlab.com/cgps/panini

Download Full-text

The fascinating and secret wild life of the budding yeast S. cerevisiae

eLife ◽

10.7554/elife.05835 ◽

2015 ◽

Vol 4 ◽

Cited By ~ 75

Author(s):

Gianni Liti

Keyword(s):

Scientific Community ◽

Large Scale ◽

Laboratory Experiments ◽

Population Genomics ◽

Budding Yeast ◽

Yeast Saccharomyces Cerevisiae ◽

Field Surveys ◽

The World ◽

Scale Population ◽

Wild Life

The budding yeast Saccharomyces cerevisiae has been used in laboratory experiments for over a century and has been instrumental in understanding virtually every aspect of molecular biology and genetics. However, it wasn't until a decade ago that the scientific community started to realise how little was known about this yeast's ecology and natural history, and how this information was vitally important for interpreting its biology. Recent large-scale population genomics studies coupled with intensive field surveys have revealed a previously unappreciated wild lifestyle of S. cerevisiae outside the restrictions of human environments and laboratories. The recent discovery that Chinese isolates harbour almost twice as much genetic variation as isolates from the rest of the world combined suggests that Asia is the likely origin of the modern budding yeast.

Download Full-text

Genome-wide identification of directed gene networks using large-scale population genomics data

10.1101/221879 ◽

2017 ◽

Cited By ~ 2

Author(s):

René Luijk ◽

Koen F. Dekkers ◽

Maarten van Iterson ◽

Wibowo Arindrarto ◽

Annique Claringbould ◽

...

Keyword(s):

Gene Networks ◽

Large Scale ◽

Target Genes ◽

Population Genomics ◽

Regulatory Gene ◽

Sequencing Data ◽

Scale Population ◽

Transcriptional Changes ◽

New Gene ◽

Novel Target

ABSTRACTIdentification of causal drivers behind regulatory gene networks is crucial in understanding gene function. We developed a method for the large-scale inference of gene-gene interactions in observational population genomics data that are both directed (using local genetic instruments as causal anchors, akin to Mendelian Randomization) and specific (by controlling for linkage disequilibrium and pleiotropy). The analysis of genotype and whole-blood RNA-sequencing data from 3,072 individuals identified 49 genes as drivers of downstream transcriptional changes (P < 7 × 10−10), among which transcription factors were overrepresented (P = 3.3 × 10−7). Our analysis suggests new gene functions and targets including for SENP7 (zinc-finger genes involved in retroviral repression) and BCL2A1 (novel target genes possibly involved in auditory dysfunction). Our work highlights the utility of population genomics data in deriving directed gene expression networks. A resource of trans-effects for all 6,600 genes with a genetic instrument can be explored individually using a web-based browser.

Download Full-text