multiSLIDE: a web server for exploring connected elements of biological pathways in multi-omics data

Mapping Intimacies ◽

10.1101/812271 ◽

2019 ◽

Author(s):

Soumita Ghosh ◽

Abhik Datta ◽

Hyungwon Choi

Keyword(s):

Keyword Search ◽

Data Sets ◽

Omics Data ◽

Web Based ◽

Molecular Features ◽

External Data ◽

Cluster Data ◽

Wide Range ◽

Time Basis ◽

Gene Ontologies

AbstractEmerging multi-omics experiments pose new challenges for exploration of quantitative data sets. We present multiSLIDE, a web-based interactive tool for simultaneous heatmap visualization of interconnected molecular features in multi-omics data sets. multiSLIDE operates by keyword search for visualizing biologically connected molecular features, such as genes in pathways and Gene Ontologies, offering convenient functionalities to rearrange, filter, and cluster data sets on a web browser in a real time basis. Various built-in querying mechanisms make it adaptable to diverse omics types, and visualizations are fully customizable. We demonstrate the versatility of the tool through three example studies, each of which showcases its applicability to a wide range of multi-omics data sets, ability to visualize the links between molecules at different granularities of measurement units, and the interface to incorporate inter-molecular relationship from external data sources into the visualization. Online and standalone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE.

Download Full-text

multiSLIDE is a web server for exploring connected elements of biological pathways in multi-omics data

Nature Communications ◽

10.1038/s41467-021-22650-x ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Soumita Ghosh ◽

Abhik Datta ◽

Hyungwon Choi

Keyword(s):

Keyword Search ◽

Data Sets ◽

Omics Data ◽

Web Browser ◽

Web Based ◽

Molecular Features ◽

Cluster Data ◽

Wide Range ◽

Or Genes ◽

Simultaneous Visualization

AbstractQuantitative multi-omics data are difficult to interpret and visualize due to large volume of data, complexity among data features, and heterogeneity of information represented by different omics platforms. Here, we present multiSLIDE, a web-based interactive tool for the simultaneous visualization of interconnected molecular features in heatmaps of multi-omics data sets. multiSLIDE visualizes biologically connected molecular features by keyword search of pathways or genes, offering convenient functionalities to query, rearrange, filter, and cluster data on a web browser in real time. Various querying mechanisms make it adaptable to diverse omics types, and visualizations are customizable. We demonstrate the versatility of multiSLIDE through three examples, showcasing its applicability to a wide range of multi-omics data sets, by allowing users to visualize established links between molecules from different omics data, as well as incorporate custom inter-molecular relationship information into the visualization. Online and stand-alone versions of multiSLIDE are available at https://github.com/soumitag/multiSLIDE.

Download Full-text

OmixAnalyzer – A Web-Based System for Management and Analysis of High-Throughput Omics Data Sets

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/978-3-642-39437-9_4 ◽

2013 ◽

pp. 46-53 ◽

Cited By ~ 1

Author(s):

Thomas Stoltmann ◽

Karin Zimmermann ◽

André Koschmieder ◽

Ulf Leser

Keyword(s):

High Throughput ◽

Data Sets ◽

Omics Data ◽

Web Based ◽

Web Based System

Download Full-text

Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx

10.1101/2021.01.10.426142 ◽

2021 ◽

Author(s):

Seyoon Ko ◽

Ginny X. Li ◽

Hyungwon Choi ◽

Joong-Ho Won

Keyword(s):

Association Analysis ◽

Regression Models ◽

Model Fitting ◽

Data Sets ◽

Omics Data ◽

Lasso Regression ◽

Independent Variables ◽

Wide Range ◽

Sequencing Studies ◽

Genomic Regions

AbstractStatistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method for understanding genotype-phenotype associations. Here we present ParProx, a state-of-the-art implementation to optimize overlapping group lasso regression models for time-to-event and classification analysis, guided by biological priors through coordinated variable selection. ParProx not only enables model fitting for ultrahigh-dimensional data within the architecture for parallel or distributed computing, but also allows users to obtain interpretable regression models consistent with known biological relationships among the independent variables, a feature long neglected in statistical modeling of omics data. We demonstrate ParProx using three different omics data sets of moderate to large numbers of variables, where we use genomic regions and pathways to arrive at sparse regression models comprised of biologically related independent variables. ParProx is naturally applicable to a wide range of studies using ultrahigh-dimensional omics data, ranging from genome-wide association analysis to single cell sequencing studies where multivariable modeling is computationally intractable.

Download Full-text

A Combined Two-mRNA Signature Associated With PD-L1 and Tumor Mutational Burden for Prognosis of Lung Adenocarcinoma

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.634697 ◽

2021 ◽

Vol 9 ◽

Author(s):

Congkuan Song ◽

Zhiquan Wu ◽

Qingwen Wang ◽

Yujin Wang ◽

Zixin Guo ◽

...

Keyword(s):

Lung Adenocarcinoma ◽

Data Sets ◽

Multiple Gene ◽

Molecular Features ◽

Mutational Burden ◽

Wide Range ◽

Tumor Mutational Burden ◽

Risk Patients ◽

New Perspective ◽

Complementary Value

Due to biological heterogeneity, lung adenocarcinoma (LUAD) patients with the same stage may exhibit variable responses to immunotherapy and a wide range of outcomes. It is urgent to seek a biomarker that can predict the prognosis and response to immunotherapy in these patients. In this study, we identified two genes (ANLN and ARNTL2) from multiple gene expression data sets, and developed a two-mRNA-based signature that can effectively distinguish high- and low-risk patients and predict patients’ response to immunotherapy. Furthermore, taking full advantage of the complementary value of clinical and molecular features, we combined the immune prognostic signature with clinical features to construct and validate a nomogram that can predict the probability of high tumor mutational burden (>10 mutations per megabyte). This may improve the estimation of immunotherapy response in LUAD patients, and provide a new perspective for clinical screening of immunotherapy beneficiaries.

Download Full-text

Multi-omics subtyping pipeline for chronic obstructive pulmonary disease

PLoS ONE ◽

10.1371/journal.pone.0255337 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255337

Author(s):

Lucas A. Gillenwater ◽

Shahab Helmi ◽

Evan Stene ◽

Katherine A. Pratte ◽

Yonghua Zhuang ◽

...

Keyword(s):

Chronic Obstructive Pulmonary Disease ◽

Pulmonary Disease ◽

The United States ◽

Support Vector ◽

Chronic Obstructive ◽

Data Sets ◽

Omics Data ◽

Obstructive Pulmonary Disease ◽

Clinical Phenotypes ◽

Molecular Features

Chronic Obstructive Pulmonary Disease (COPD) is the third leading cause of mortality in the United States; however, COPD has heterogeneous clinical phenotypes. This is the first large scale attempt which uses transcriptomics, proteomics, and metabolomics (multi-omics) to determine whether there are molecularly defined clusters with distinct clinical phenotypes that may underlie the clinical heterogeneity. Subjects included 3,278 subjects from the COPDGene cohort with at least one of the following profiles: whole blood transcriptomes (2,650 subjects); plasma proteomes (1,013 subjects); and plasma metabolomes (1,136 subjects). 489 subjects had all three contemporaneous -omics profiles. Autoencoder embeddings were performed individually for each -omics dataset. Embeddings underwent subspace clustering using MineClus, either individually by -omics or combined, followed by recursive feature selection based on Support Vector Machines. Clusters were tested for associations with clinical variables. Optimal single -omics clustering typically resulted in two clusters. Although there was overlap for individual -omics cluster membership, each -omics cluster tended to be defined by unique molecular pathways. For example, prominent molecular features of the metabolome-based clustering included sphingomyelin, while key molecular features of the transcriptome-based clusters were related to immune and bacterial responses. We also found that when we integrated the -omics data at a later stage, we identified subtypes that varied based on age, severity of disease, in addition to diffusing capacity of the lungs for carbon monoxide, and precent on atrial fibrillation. In contrast, when we integrated the -omics data at an earlier stage by treating all data sets equally, there were no clinical differences between subtypes. Similar to clinical clustering, which has revealed multiple heterogenous clinical phenotypes, we show that transcriptomics, proteomics, and metabolomics tend to define clusters of COPD patients with different clinical characteristics. Thus, integrating these different -omics data sets affords additional insight into the molecular nature of COPD and its heterogeneity.

Download Full-text

mixOmics: an R package for ‘omics feature selection and multiple data integration

10.1101/108597 ◽

2017 ◽

Cited By ~ 19

Author(s):

Florian Rohart ◽

Benoît Gautier ◽

Amrit Singh ◽

Kim-Anh Lê Cao

Keyword(s):

Data Integration ◽

Large Scale ◽

Relevant Information ◽

R Package ◽

Biological Data ◽

Molecular Signature ◽

Single Type ◽

Data Sets ◽

Omics Data ◽

Wide Range

AbstractThe advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently.We introducemixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a system biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latestmixOmicsintegrative frameworks for the multivariate analyses of ‘omics data available from the package.

Download Full-text

Time distributed data analysis by Cosinor.Online application

10.1101/805960 ◽

2019 ◽

Cited By ~ 3

Author(s):

Lubos Molcan

Keyword(s):

Programming Languages ◽

Data Sets ◽

Distributed Data ◽

Web Browsers ◽

Web Based ◽

Software Applications ◽

Physiological Processes ◽

Wide Range ◽

Distributed Data Analysis ◽

Excel File

AbstractPhysiological processes oscillate in time. Circadian oscillations, over approximately 24-h, are very important and among the most studied. To evaluate the presence and significance of 24-h oscillations, physiological time distributed data (TDD) are often set to a cosinor model using a wide range of irregularly updated native apps. If users are familiar with MATLAB, R or other programming languages, users can adjust the parameters of the cosinor model setting based on their needs. Nowadays, many software applications are hosted on remote servers running 24/7. Server-based software applications enable quick analysis of big data sets and run on a wide range of terminal devices using standard web browsers. We created a simple web-based cosinor application, Cosinor.Online. The application code is written in PHP. TDD is handled using a MySQL database and can be copied directly from an Excel file to the webform. Analysis results contain information about setting the 24-h oscillation and a unique ID identifier. The identifier allows users to reopen data and results repeatedly over one month or remove their data from the MySQL database. Our web-based application can be used for a quick and simple inspection of 24-h oscillations of various biological and physiological TDD.

Download Full-text

BioMiner: Paving the Way for Personalized Medicine

Cancer Informatics ◽

10.4137/cin.s20910 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S20910 ◽

Cited By ~ 5

Author(s):

Chris Bauer ◽

Karol Stec ◽

Alexander Glintschert ◽

Kristina Gruden ◽

Christian Schichor ◽

...

Keyword(s):

Personalized Medicine ◽

High Throughput ◽

Human Biology ◽

Supplementary File ◽

Data Sets ◽

Omics Data ◽

The Novel ◽

Web Based ◽

High Throughput Data ◽

Interdisciplinary Project

Personalized medicine is promising a revolution for medicine and human biology in the 21st century. The scientific foundation for this revolution is accomplished by analyzing biological high-throughput data sets from genomics, transcriptomics, proteomics, and metabolomics. Currently, access to these data has been limited to either rather simple Web-based tools, which do not grant much insight or analysis by trained specialists, without firsthand involvement of the physician. Here, we present the novel Web-based tool “BioMiner,” which was developed within the scope of an international and interdisciplinary project (SYSTHER†) and gives access to a variety of high-throughput data sets. It provides the user with convenient tools to analyze complex cross-omics data sets and grants enhanced visualization abilities. BioMiner incorporates transcriptomic and cross-omics high-throughput data sets, with a focus on cancer. A public instance of BioMiner along with the database is available at http://systherDB.microdiscovery.de/ , login and password: “systher”; a tutorial detailing the usage of BioMiner can be found in the Supplementary File.

Download Full-text

Investigating the Efficacy and Cost-Effectiveness of a Web-based Self-help Program for People With Adjustment Problems After an Accident (SelFIT): Protocol for a Randomized Controlled Trial (Preprint)

10.2196/preprints.21200 ◽

2020 ◽

Author(s):

Julia Hegy ◽

Noemi Anja Brog ◽

Thomas Berger ◽

Hansjoerg Znoj

Keyword(s):

Health Care ◽

Psychological Distress ◽

Cost Effectiveness ◽

Well Being ◽

User Friendliness ◽

Adjustment Problems ◽

Health Care Consumption ◽

Web Based ◽

Self Help ◽

Wide Range

BACKGROUND Accidents and the resulting injuries are one of the world’s biggest health care issues often causing long-term effects on psychological and physical health. With regard to psychological consequences, accidents can cause a wide range of burdens including adjustment problems. Although adjustment problems are among the most frequent mental health problems, there are few specific interventions available. The newly developed program SelFIT aims to remedy this situation by offering a low-threshold web-based self-help intervention for psychological distress after an accident. OBJECTIVE The overall aim is to evaluate the efficacy and cost-effectiveness of the SelFIT program plus care as usual (CAU) compared to only care as usual. Furthermore, the program’s user friendliness, acceptance and adherence are assessed. We expect that the use of SelFIT is associated with a greater reduction in psychological distress, greater improvement in mental and physical well-being, and greater cost-effectiveness compared to CAU. METHODS Adults (n=240) showing adjustment problems due to an accident they experienced between 2 weeks and 2 years before entering the study will be randomized. Participants in the intervention group receive direct access to SelFIT. The control group receives access to the program after 12 weeks. There are 6 measurement points for both groups (baseline as well as after 4, 8, 12, 24 and 36 weeks). The main outcome is a reduction in anxiety, depression and stress symptoms that indicate adjustment problems. Secondary outcomes include well-being, optimism, embitterment, self-esteem, self-efficacy, emotion regulation, pain, costs of health care consumption and productivity loss as well as the program’s adherence, acceptance and user-friendliness. RESULTS Recruitment started in December 2019 and is ongoing. CONCLUSIONS To the best of our knowledge, this is the first study examining a web-based self-help program designed to treat adjustment problems resulting from an accident. If effective, the program could complement the still limited offer of secondary and tertiary psychological prevention after an accident. CLINICALTRIAL ClinicalTrials.gov NCT03785912; https://clinicaltrials.gov/ct2/show/NCT03785912?cond=NCT03785912&draw=2&rank=1

Download Full-text

mtDNAcombine: tools to combine sequences from multiple studies

BMC Bioinformatics ◽

10.1186/s12859-021-04048-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eleanor F. Miller ◽

Andrea Manica

Keyword(s):

Sequence Data ◽

Data Extraction ◽

Bayesian Skyline Plot ◽

Model Organisms ◽

Data Sets ◽

Data Handling ◽

Online Database ◽

Genetic Studies ◽

Wide Range ◽

Existing Data

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.

Download Full-text