Interactive visualization systems and data integration methods for supporting discovery in collections of scientific information

2021 ◽  
Author(s):  
Donald Anthony Pellegrino
2004 ◽  
Vol 94 (1-3) ◽  
pp. 249-261 ◽  
Author(s):  
Nicholas W. Locantore ◽  
Liem T. Tran ◽  
Robert V. O'Neill ◽  
Peter W. McKinnis ◽  
Elizabeth R. Smith ◽  
...  

Author(s):  
Frida Torell ◽  
Tomas Skotare ◽  
Johan Trygg

Data integration has been proven to provide valuable information. The information extracted using data integration in the form of multiblock analysis can pinpoint both common and unique trends in the different blocks. When working with small multiblock datasets the number of possible integration methods is drastically reduced. To investigate the application of multiblock analysis in cases where one has few number of samples, we studied a small metabolomic multiblock dataset containing six blocks (i.e. tissue types), only including common metabolites. We used a single model multiblock analysis method called Joint and unique multiblock analysis (JUMBA) and compare it to a commonly used method, concatenated PCA. These methods were used to detect trends in the dataset and identify underlying factors responsible for metabolic variations. Using JUMBA, we were able to interpret the extracted components and link them to relevant biological properties. JUMBA shows how the observations are related to one another, the stability of these relationships and to what extent each of the blocks contribute to the components. These results indicate that multiblock methods can be useful even with a small number of samples.


Author(s):  
MD Luecken ◽  
M Büttner ◽  
K Chaichoompu ◽  
A Danese ◽  
M Interlandi ◽  
...  

AbstractCell atlases often include samples that span locations, labs, and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration.Choosing a data integration method is a challenge due to the difficulty of defining integration success. Here, we benchmark 38 method and preprocessing combinations on 77 batches of gene expression, chromatin accessibility, and simulation data from 23 publications, altogether representing >1.2 million cells distributed in nine atlas-level integration tasks. Our integration tasks span several common sources of variation such as individuals, species, and experimental labs. We evaluate methods according to scalability, usability, and their ability to remove batch effects while retaining biological variation.Using 14 evaluation metrics, we find that highly variable gene selection improves the performance of data integration methods, whereas scaling pushes methods to prioritize batch removal over conservation of biological variation. Overall, BBKNN, Scanorama, and scVI perform well, particularly on complex integration tasks; Seurat v3 performs well on simpler tasks with distinct biological signals; and methods that prioritize batch removal perform best for ATAC-seq data integration. Our freely available reproducible python module can be used to identify optimal data integration methods for new data, benchmark new methods, and improve method development.


2019 ◽  
Vol 10 ◽  
Author(s):  
Justin M. Luningham ◽  
Daniel B. McArtor ◽  
Anne M. Hendriks ◽  
Catharina E. M. van Beijsterveldt ◽  
Paul Lichtenstein ◽  
...  

2021 ◽  
Author(s):  
Laura M. Richards ◽  
Mazdak Riverin ◽  
Suluxan Mohanraj ◽  
Shamini Ayyadhury ◽  
Danielle C. Croucher ◽  
...  

Tumours are routinely profiled with single-cell RNA sequencing (scRNA-seq) to characterize their diverse cellular ecosystems of malignant, immune, and stromal cell types. When combining data from multiple samples or studies, batch-specific technical variation can confound biological signals. However, scRNA-seq batch integration methods are often not designed for, or benchmarked, on datasets containing cancer cells. Here, we compare 5 data integration tools applied to 171,206 cells from 5 tumour scRNA-seq datasets. Based on our results, STACAS and fastMNN are the most suitable methods for integrating tumour datasets, demonstrating robust batch effect correction while preserving relevant biological variability in the malignant compartment. This comparison provides a framework for evaluating how well single-cell integration methods correct for technical variability while preserving biological heterogeneity of malignant and non-malignant cell populations.


Sign in / Sign up

Export Citation Format

Share Document