CNEr: a toolkit for exploring extreme noncoding conservation

Mapping Intimacies ◽

10.1101/575704 ◽

2019 ◽

Author(s):

Ge Tan ◽

Dimitris Polychronopoulos ◽

Boris Lenhard

Keyword(s):

Large Scale ◽

Fruit Fly ◽

Tsetse Fly ◽

Chromatin Conformation ◽

Developmentally Regulated ◽

Link Type ◽

Topologically Associating Domains ◽

Conserved Noncoding Elements ◽

Chromatin Biology ◽

Genome Comparisons

AbstractConserved Noncoding Elements (CNEs) are elements exhibiting extreme noncoding conservation in Metazoan genomes. They cluster around developmental genes and act as long-range enhancers, yet nothing that we know about their function explains the observed conservation levels. Clusters of CNEs coincide with topologically associating domains (TADs), indicating ancient origins and stability of TAD locations. This has suggested further hypotheses about the still elusive origin of CNEs, and has provided a comparative genomics-based method of estimating the position of TADs around developmentally regulated genes in genomes where chromatin conformation capture data is missing. To enable researchers in gene regulation and chromatin biology to start deciphering this phenomenon, we developedCNEr, a R/Bioconductor toolkit for large-scale identification of CNEs and for studying their genomic properties. We applyCNErto two novel genome comparisons - fruit fly vs tsetse fly, and two sea urchin genomes - and report novel insights gained from their analysis. We also show how to reveal interesting characteristics of CNEs by coupling CNEr with existing Bioconductor packages.CNEris available at Bioconductor (https://bioconductor.org/packages/CNEr/) and maintained at github (https://github.com/ge11232002/CNEr).

Download Full-text

Proteomic profiling dataset of chemical perturbations in multiple biological backgrounds

Scientific Data ◽

10.1038/s41597-021-01008-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Deborah O. Dele-Oni ◽

Karen E. Christianson ◽

Shawn B. Egri ◽

Alvaro Sebastian Vaca Jacome ◽

Katherine C. DeRuff ◽

...

Keyword(s):

Large Scale ◽

Cell Model ◽

Cellular Responses ◽

Proteomic Profiling ◽

Reduced Representation ◽

Link Type ◽

Original Dataset ◽

Quality Control Metrics ◽

Biological Insight ◽

Chromatin Profiling

AbstractWhile gene expression profiling has traditionally been the method of choice for large-scale perturbational profiling studies, proteomics has emerged as an effective tool in this context for directly monitoring cellular responses to perturbations. We previously reported a pilot library containing 3400 profiles of multiple perturbations across diverse cellular backgrounds in the reduced-representation phosphoproteome (P100) and chromatin space (Global Chromatin Profiling, GCP). Here, we expand our original dataset to include profiles from a new set of cardiotoxic compounds and from astrocytes, an additional neural cell model, totaling 5300 proteomic signatures. We describe filtering criteria and quality control metrics used to assess and validate the technical quality and reproducibility of our data. To demonstrate the power of the library, we present two case studies where data is queried using the concept of “connectivity” to obtain biological insight. All data presented in this study have been deposited to the ProteomeXchange Consortium with identifiers PXD017458 (P100) and PXD017459 (GCP) and can be queried at https://clue.io/proteomics.

Download Full-text

Pseudomonas bijieensis sp. nov., isolated from cornfield soil

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijsem.0.004676 ◽

2019 ◽

Vol 71 (3) ◽

Cited By ~ 4

Author(s):

Jingling Liang ◽

Sai Wang ◽

Ayizekeranmu Yiming ◽

Luoyi Fu ◽

Iftikhar Ahmad ◽

...

Keyword(s):

16S Rrna ◽

Type Species ◽

Novel Species ◽

Guizhou Province ◽

Rrna Gene ◽

Gene Sequences ◽

Content Type ◽

Link Type ◽

Pr China ◽

Genome Comparisons

Strain L22-9T, a Gram-stain-negative and rod-shaped bacterium, motile by one polar flagellum, was isolated from cornfield soil in Bijie, Guizhou Province, PR China. Based on 16S rRNA gene sequences, it was identified as a Pseudomonas species. Multilocus sequence analysis of concatenated 16S rRNA, gyrB, rpoB and rpoD gene sequences showed that strain L22-9T formed a clearly separated branch, located in a cluster together with Pseudomonas brassicacearum LMG 21623T, Pseudomonas kilonensis DSM 13647T and Pseudomonas thivervalensis DSM 13194T. Whole-genome comparisons based on average nucleotide identity (ANI) and digital DNA–DNA hybridization (dDDH) confirmed that strain L22-9T should be classified as a novel species. It was most closely related to P. kilonensis DSM 13647T with ANI and dDDH values of 91.87 and 46.3 %, respectively. Phenotypic features that can distinguish strain L22-9T from P. kilonensis DSM 13647T are the assimilation ability of N-acetyl-d-glucosamine, poor activity of arginine dihydrolase and failure to ferment ribose and d-fucose. The predominant cellular fatty acids of strain L22-9T are C16 : 0, summed feature 3 (C16 : 1 ω6c and/or C16 : 1 ω7c) and summed feature 8 (C18 : 1 ω7c and/or C18 : 1 ω6c). The respiratory quinones consist of Q-9 and Q-8. The polar lipids are diphosphatidylglycerol, phosphatidylethanolamine, two unidentified phosphoglycolipids, two unidentified aminophospholipids and an unidentified glycolipid. Based on the evidence, we conclude that strain L22-9T represents a novel species, for which the name Pseudomonas bijieensis sp. nov. is proposed. The type strain is L22-9T (=CGMCC 1.18528T=LMG 31948T), with a DNA G+C content of 60.85 mol%.

Download Full-text

SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008439 ◽

2020 ◽

Vol 16 (12) ◽

pp. e1008439

Author(s):

Jennifer Lu ◽

Steven L. Salzberg

Keyword(s):

Large Scale ◽

Analysis Tool ◽

Index Test ◽

Bacterial Genomes ◽

Phylogenetic Groups ◽

Bacterial Phyla ◽

Link Type ◽

Gc Skew ◽

A Genome ◽

Web App

GC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an analysis tool for the 15,000+ complete bacterial genomes in NCBI’s Refseq library. In order to analyze all 15,000+ genomes, we introduce a new method, SkewIT (Skew Index Test), that calculates a single metric representing the degree of GC skew for a genome. Using this metric, we demonstrate how GC skew patterns are conserved within certain bacterial phyla, e.g. Firmicutes, but show different patterns in other phylogenetic groups such as Actinobacteria. We also discovered that outlier values of SkewIT highlight potential bacterial mis-assemblies. Using our newly defined metric, we identify multiple mis-assembled chromosomal sequences in previously published complete bacterial genomes. We provide a SkewIT web app https://jenniferlu717.shinyapps.io/SkewIT/ that calculates SkewI for any user-provided bacterial sequence. The web app also provides an interactive interface for the data generated in this paper, allowing users to further investigate the SkewI values and thresholds of the Refseq-97 complete bacterial genomes. Individual scripts for analysis of bacterial genomes are provided in the following repository: https://github.com/jenniferlu717/SkewIT.

Download Full-text

PaperBLAST: Text-mining papers for information about homologs

10.1101/133041 ◽

2017 ◽

Author(s):

Morgan N. Price ◽

Adam P. Arkin

Keyword(s):

Text Mining ◽

Genome Sequencing ◽

Full Text ◽

Large Scale ◽

Scientific Literature ◽

Protein Sequences ◽

Protein Coding ◽

Link Protein ◽

Protein Coding Genes ◽

Link Type

AbstractLarge-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources that link protein sequences to scientific articles (Swiss-Prot, GeneRIF, and EcoCyc). PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/.

Download Full-text

CrosstalkNet: mining large-scale bipartite co-expression networks to characterize epi-stroma crosstalk

10.1101/102848 ◽

2017 ◽

Author(s):

Venkata Manem ◽

George Adam ◽

Tina Gruosso ◽

Mathieu Gigoux ◽

Nicholas Bertos ◽

...

Keyword(s):

Stromal Cells ◽

Large Scale ◽

Network Visualization ◽

Visualization Tool ◽

Biological Processes ◽

Web Based ◽

Link Type ◽

A Cell ◽

Large Scale Networks ◽

User Friendly

ABSTRACTBackground:Over the last several years, we have witnessed the metamorphosis of network biology from being a mere representation of molecular interactions to models enabling inference of complex biological processes. Networks provide promising tools to elucidate intercellular interactions that contribute to the functioning of key biological pathways in a cell. However, the exploration of these large-scale networks remains a challenge due to their high-dimensionality.Results:CrosstalkNet is a user friendly, web-based network visualization tool to retrieve and mine interactions in large-scale bipartite co-expression networks. In this study, we discuss the use of gene co-expression networks to explore the rewiring of interactions between tumor epithelial and stromal cells. We show how CrosstalkNet can be used to efficiently visualize, mine, and interpret large co-expression networks representing the crosstalk occurring between the tumour and its microenvironment.Conclusion:CrosstalkNet serves as a tool to assist biologists and clinicians in exploring complex, large interaction graphs to obtain insights into the biological processes that govern the tumor epithelial-stromal crosstalk. A comprehensive tutorial along with case studies are provided with the application.Availability:The web-based application is available at the following location: http://epistroma.pmgenomics.ca/app/. The code is open-source and freely available from http://github.com/bhklab/EpiStroma-webapp.Contact:[email protected]

Download Full-text

XlinkCyNET: a Cytoscape application for visualization of protein interaction networks based on cross-linking mass-spectrometry identifications

10.1101/2020.12.20.423654 ◽

2020 ◽

Author(s):

Diogo Borges Lima ◽

Ying Zhu ◽

Fan Liu

Keyword(s):

Mass Spectrometry ◽

Protein Interaction ◽

Large Scale ◽

Interaction Network ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Cross Linking ◽

Protein Interaction Data ◽

Interaction Data ◽

Link Type

ABSTRACTSoftware tools that allow visualization and analysis of protein interaction networks are essential for studies in systems biology. One of the most popular network visualization tools in biology is Cytoscape, which offers a large selection of plugins for interpretation of protein interaction data. Chemical cross-linking coupled to mass spectrometry (XL-MS) is an increasingly important source for such interaction data, but there are currently no Cytoscape tools to analyze XL-MS results. In light of the suitability of Cytoscape platform but also to expand its toolbox, here we introduce XlinkCyNET, an open-source Cytoscape Java plugin for exploring large-scale XL-MS-based protein interaction networks. XlinkCyNET offers rapid and easy visualization of intra and intermolecular cross-links and the locations of protein domains in a rectangular bar style, allowing subdomain-level interrogation of the interaction network. XlinkCyNET is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/xlinkcynet and at https://www.theliulab.com/software/xlinkcynet.

Download Full-text

ComplexBrowser: a tool for identification and quantification of protein complexes in large scale proteomics datasets

10.1101/573774 ◽

2019 ◽

Author(s):

Wojciech Michalak ◽

Vasileios Tsiamis ◽

Veit Schwämmle ◽

Adelina Rogowska-Wrzesińska

Keyword(s):

T Cell Activation ◽

Protein Complex ◽

Quantitative Proteomics ◽

Large Scale ◽

Protein Complexes ◽

Quantitative Measure ◽

Exploratory Analysis ◽

List Type ◽

Link Type ◽

Complex Components

AbstractWe have developed ComplexBrowser, an open source, online platform for supervised analysis of quantitative proteomics data that focuses on protein complexes. The software uses information from CORUM and Complex Portal databases to identify protein complex components. Based on the expression changes of individual complex subunits across the proteomics experiment it calculates Complex Fold Change (CFC) factor that characterises the overall protein complex expression trend and the level of subunit co-regulation. Thus up- and down-regulated complexes can be identified. It provides interactive visualisation of protein complexes composition and expression for exploratory analysis. It also incorporates a quality control step that includes normalisation and statistical analysis based on Limma test. ComplexBrowser performance was tested on two previously published proteomics studies identifying changes in protein expression in human adenocarcinoma tissue and during activation of mouse T-cells. The analysis revealed 1519 and 332 protein complexes, of which 233 and 41 were found co-ordinately regulated in the respective studies. The adopted approach provided evidence for a shift to glucose-based metabolism and high proliferation in adenocarcinoma tissues and identification of chromatin remodelling complexes involved in mouse T-cell activation. The results correlate with the original interpretation of the experiments and also provide novel biological details about protein complexes affected. ComplexBrowser is, to our knowledge, the first tool to automate quantitative protein complex analysis for high-throughput studies, providing insights into protein complex regulation within minutes of analysis.A fully functional demo version of ComplexBrowser v1.0 is available online via http://computproteomics.bmb.sdu.dk/Apps/ComplexBrowser/The source code can be downloaded from: https://bitbucket.org/michalakw/complexbrowserHighlightsAutomated analysis of protein complexes in proteomics experimentsQuantitative measure of the coordinated changes in protein complex componentsInteractive visualisations for exploratory analysis of proteomics resultsIn briefComplexBrowser is capable of identifying protein complexes in datasets obtained from large scale quantitative proteomics experiments. It provides, in the form of the CFC factor, a quantitative measure of the coordinated changes in complex components. This facilitates assessing the overall trends in the processes governed by the identified protein complexes providing a new and complementary way of interpreting proteomics experiments.

Download Full-text

Genetic effect estimates in case-control studies when a continuous variable is omitted from the model

10.1101/756015 ◽

2019 ◽

Author(s):

Ying Sheng ◽

Chiung-Yu Huang ◽

Siarhei Lobach ◽

Lydia Zablotska ◽

Iryna Lobach ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Large Scale ◽

False Positive Rate ◽

Continuous Variable ◽

Genetic Effects ◽

Data Availability ◽

Conditional Density ◽

Link Type ◽

Genome Wide

ABSTRACTLarge-scale genome-wide analyses scans provide massive volumes of genetic variants on large number of cases and controls that can be used to estimate the genetic effects. Yet, the sets of non-genetic variables available in publicly available databases are often brief. It is known that omitting a continuous variable from a logistic regression model can result in biased estimates of odds ratios (OR) (e.g., Gail et al (1984), Neuhaus et al (1993), Hauck et al (1991), Zeger et al (1988)). We are interested to assess what information is needed to recover the bias in the OR estimate of genotype due to omitting a continuous variable in settings when the actual values of the omitted variable are not available. We derive two estimating procedures that can recover the degree of bias based on a conditional density of the omitted variable or knowing the distribution of the omitted variable. Importantly, our derivations show that omitting a continuous variable can result in either under- or over-estimation of the genetic effects. We performed extensive simulation studies to examine bias, variability, false positive rate, and power in the model that omits a continuous variable. We show the application to two genome-wide studies of Alzheimer’s disease.Data Availability StatementThe data that support the findings of this study are openly available in the Database of Genotypes and Phenotypes at [https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/study.cgi?study_id=phs000372.v1.p1], reference number [phs000372.v1.p1] and at the Alzheimer’s Disease Neuroimaging Initiative http://adni.loni.usc.edu/.

Download Full-text

REVA as a Well-curated Database for Human Expression-modulating Variants

10.1101/2021.02.24.432622 ◽

2021 ◽

Author(s):

Yu Wang ◽

Fang-Yuan Shi ◽

Yu Liang ◽

Ge Gao

Keyword(s):

Large Scale ◽

Regulatory Mechanism ◽

State Of The Art ◽

Scale Analysis ◽

Computational Tools ◽

Functional Annotations ◽

Link Type ◽

Large Scale Analysis ◽

Multiple State ◽

Limited Sensitivity

AbstractMore than 80% of disease- and trait-associated human variants are noncoding. By systematically screening multiple large-scale studies, we compiled REVA, a manually curated database for over 11.8 million experimentally tested noncoding variants with expression-modulating potentials. We provided 2424 functional annotations that could be used to pinpoint plausible regulatory mechanism of these variants. We further benchmarked multiple state-of-the-art computational tools and found their limited sensitivity remains a serious challenge for effective large-scale analysis. REVA provides high-qualify experimentally tested expression-modulating variants with extensive functional annotations, which will be useful for users in the noncoding variants community. REVA is available at http://reva.gao-lab.org.

Download Full-text

OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries

Genome Biology ◽

10.1186/s13059-019-1893-y ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 7

Author(s):

Lin An ◽

Tao Yang ◽

Jiahao Yang ◽

Johannes Nuebler ◽

Guanjue Xiang ◽

...

Keyword(s):

Gene Expression ◽

Gene Regulation ◽

Domain Structure ◽

High Frequency ◽

Spatial Organization ◽

Structural Units ◽

Regulatory Interactions ◽

Link Type ◽

Topologically Associating Domains

AbstractThe spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not homogeneous structural units but appear to be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, to identify hierarchical TADs. OnTAD reveals new biological insights into the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. OnTAD is available at https://github.com/anlin00007/OnTAD.

Download Full-text