Interactive visualization systems and data integration methods for supporting discovery in collections of scientific information

An Overview of Data Integration Methods for Regional Assessment

Environmental Monitoring and Assessment ◽

10.1023/b:emas.0000016892.67527.4c ◽

2004 ◽

Vol 94 (1-3) ◽

pp. 249-261 ◽

Cited By ~ 22

Author(s):

Nicholas W. Locantore ◽

Liem T. Tran ◽

Robert V. O'Neill ◽

Peter W. McKinnis ◽

Elizabeth R. Smith ◽

...

Keyword(s):

Data Integration ◽

Regional Assessment ◽

Integration Methods

Download Full-text

Application of Multiblock Analysis on Small Metabolomic Multi-Tissue Dataset

10.20944/preprints202007.0227.v1 ◽

2020 ◽

Author(s):

Frida Torell ◽

Tomas Skotare ◽

Johan Trygg

Keyword(s):

Data Integration ◽

Biological Properties ◽

Analysis Method ◽

Single Model ◽

Integration Methods ◽

Using Data ◽

The Stability

Data integration has been proven to provide valuable information. The information extracted using data integration in the form of multiblock analysis can pinpoint both common and unique trends in the different blocks. When working with small multiblock datasets the number of possible integration methods is drastically reduced. To investigate the application of multiblock analysis in cases where one has few number of samples, we studied a small metabolomic multiblock dataset containing six blocks (i.e. tissue types), only including common metabolites. We used a single model multiblock analysis method called Joint and unique multiblock analysis (JUMBA) and compare it to a commonly used method, concatenated PCA. These methods were used to detect trends in the dataset and identify underlying factors responsible for metabolic variations. Using JUMBA, we were able to interpret the extracted components and link them to relevant biological properties. JUMBA shows how the observations are related to one another, the stability of these relationships and to what extent each of the blocks contribute to the components. These results indicate that multiblock methods can be useful even with a small number of samples.

Download Full-text

Benchmarking atlas-level data integration in single-cell genomics

10.1101/2020.05.22.111161 ◽

2020 ◽

Cited By ~ 15

Author(s):

MD Luecken ◽

M Büttner ◽

K Chaichoompu ◽

A Danese ◽

M Interlandi ◽

...

Keyword(s):

Data Integration ◽

Method Development ◽

Gene Selection ◽

Chromatin Accessibility ◽

Biological Variation ◽

Joint Analysis ◽

Batch Effects ◽

Integration Methods ◽

Level Data ◽

Complex Integration

AbstractCell atlases often include samples that span locations, labs, and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration.Choosing a data integration method is a challenge due to the difficulty of defining integration success. Here, we benchmark 38 method and preprocessing combinations on 77 batches of gene expression, chromatin accessibility, and simulation data from 23 publications, altogether representing >1.2 million cells distributed in nine atlas-level integration tasks. Our integration tasks span several common sources of variation such as individuals, species, and experimental labs. We evaluate methods according to scalability, usability, and their ability to remove batch effects while retaining biological variation.Using 14 evaluation metrics, we find that highly variable gene selection improves the performance of data integration methods, whereas scaling pushes methods to prioritize batch removal over conservation of biological variation. Overall, BBKNN, Scanorama, and scVI perform well, particularly on complex integration tasks; Seurat v3 performs well on simpler tasks with distinct biological signals; and methods that prioritize batch removal perform best for ATAC-seq data integration. Our freely available reproducible python module can be used to identify optimal data integration methods for new data, benchmark new methods, and improve method development.

Download Full-text

Data Integration Methods for Phenotype Harmonization in Multi-Cohort Genome-Wide Association Studies With Behavioral Outcomes

Frontiers in Genetics ◽

10.3389/fgene.2019.01227 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 2

Author(s):

Justin M. Luningham ◽

Daniel B. McArtor ◽

Anne M. Hendriks ◽

Catharina E. M. van Beijsterveldt ◽

Paul Lichtenstein ◽

...

Keyword(s):

Data Integration ◽

Association Studies ◽

Behavioral Outcomes ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Integration Methods ◽

Genome Wide

Download Full-text

Practical Evaluation of Different Omics Data Integration Methods

Precision Health and Medicine - Studies in Computational Intelligence ◽

10.1007/978-3-030-24409-5_20 ◽

2019 ◽

pp. 193-197 ◽

Cited By ~ 1

Author(s):

Wenjia Feng ◽

Zekun Yu ◽

Mingon Kang ◽

Haijun Gong ◽

Tae-Hyuk Ahn

Keyword(s):

Data Integration ◽

Omics Data ◽

Integration Methods ◽

Practical Evaluation ◽

Omics Data Integration

Download Full-text

Semantic data integration methods for chemical domain

2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA) ◽

10.1109/iciea.2018.8398184 ◽

2018 ◽

Author(s):

Chen Yang ◽

Fangfang Xu

Keyword(s):

Data Integration ◽

Integration Methods ◽

Semantic Data ◽

Semantic Data Integration

Download Full-text

A comparison of data integration methods for single-cell RNA sequencing of cancer samples

10.1101/2021.08.04.453579 ◽

2021 ◽

Author(s):

Laura M. Richards ◽

Mazdak Riverin ◽

Suluxan Mohanraj ◽

Shamini Ayyadhury ◽

Danielle C. Croucher ◽

...

Keyword(s):

Data Integration ◽

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Malignant Cell ◽

Integration Methods ◽

Combining Data ◽

Single Cell Rna Sequencing ◽

Biological Heterogeneity ◽

Multiple Samples

Tumours are routinely profiled with single-cell RNA sequencing (scRNA-seq) to characterize their diverse cellular ecosystems of malignant, immune, and stromal cell types. When combining data from multiple samples or studies, batch-specific technical variation can confound biological signals. However, scRNA-seq batch integration methods are often not designed for, or benchmarked, on datasets containing cancer cells. Here, we compare 5 data integration tools applied to 171,206 cells from 5 tumour scRNA-seq datasets. Based on our results, STACAS and fastMNN are the most suitable methods for integrating tumour datasets, demonstrating robust batch effect correction while preserving relevant biological variability in the malignant compartment. This comparison provides a framework for evaluating how well single-cell integration methods correct for technical variability while preserving biological heterogeneity of malignant and non-malignant cell populations.

Download Full-text