Data integration methods to account for spatial niche truncation effects in regional projections of species distribution

2021 ◽  
Author(s):  
Mathieu Chevalier ◽  
Olivier Broennimann ◽  
Josselin Cornuault ◽  
Antoine Guisan
2004 ◽  
Vol 94 (1-3) ◽  
pp. 249-261 ◽  
Author(s):  
Nicholas W. Locantore ◽  
Liem T. Tran ◽  
Robert V. O'Neill ◽  
Peter W. McKinnis ◽  
Elizabeth R. Smith ◽  
...  

2019 ◽  
Vol 29 (3) ◽  
pp. 590-602 ◽  
Author(s):  
Youri Martin ◽  
Hans Van Dyck ◽  
Pierre Legendre ◽  
Josef Settele ◽  
Oliver Schweiger ◽  
...  

Author(s):  
Frida Torell ◽  
Tomas Skotare ◽  
Johan Trygg

Data integration has been proven to provide valuable information. The information extracted using data integration in the form of multiblock analysis can pinpoint both common and unique trends in the different blocks. When working with small multiblock datasets the number of possible integration methods is drastically reduced. To investigate the application of multiblock analysis in cases where one has few number of samples, we studied a small metabolomic multiblock dataset containing six blocks (i.e. tissue types), only including common metabolites. We used a single model multiblock analysis method called Joint and unique multiblock analysis (JUMBA) and compare it to a commonly used method, concatenated PCA. These methods were used to detect trends in the dataset and identify underlying factors responsible for metabolic variations. Using JUMBA, we were able to interpret the extracted components and link them to relevant biological properties. JUMBA shows how the observations are related to one another, the stability of these relationships and to what extent each of the blocks contribute to the components. These results indicate that multiblock methods can be useful even with a small number of samples.


Author(s):  
MD Luecken ◽  
M Büttner ◽  
K Chaichoompu ◽  
A Danese ◽  
M Interlandi ◽  
...  

AbstractCell atlases often include samples that span locations, labs, and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration.Choosing a data integration method is a challenge due to the difficulty of defining integration success. Here, we benchmark 38 method and preprocessing combinations on 77 batches of gene expression, chromatin accessibility, and simulation data from 23 publications, altogether representing >1.2 million cells distributed in nine atlas-level integration tasks. Our integration tasks span several common sources of variation such as individuals, species, and experimental labs. We evaluate methods according to scalability, usability, and their ability to remove batch effects while retaining biological variation.Using 14 evaluation metrics, we find that highly variable gene selection improves the performance of data integration methods, whereas scaling pushes methods to prioritize batch removal over conservation of biological variation. Overall, BBKNN, Scanorama, and scVI perform well, particularly on complex integration tasks; Seurat v3 performs well on simpler tasks with distinct biological signals; and methods that prioritize batch removal perform best for ATAC-seq data integration. Our freely available reproducible python module can be used to identify optimal data integration methods for new data, benchmark new methods, and improve method development.


2019 ◽  
Vol 10 ◽  
Author(s):  
Justin M. Luningham ◽  
Daniel B. McArtor ◽  
Anne M. Hendriks ◽  
Catharina E. M. van Beijsterveldt ◽  
Paul Lichtenstein ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document