Introduction to this special section: Cross-disciplinary applications of geophysics

Kyle Spikes; Yongyi Li

doi:10.1190/tle37090654.1

Introduction to this special section: Cross-disciplinary applications of geophysics

The Leading Edge ◽

10.1190/tle37090654.1 ◽

2018 ◽

Vol 37 (9) ◽

pp. 654-654

Author(s):

Kyle Spikes ◽

Yongyi Li

Keyword(s):

Data Integration ◽

Special Section ◽

Unconventional Reservoirs ◽

Data Sets ◽

Data Types ◽

Multiple Data

As the volume of seismic and other data types continues to increase, the use of such data sets has extended to different approaches and techniques of data integration and interpretation. The intent of this special section on cross-disciplinary applications of geophysics is to highlight such uses of multiple data types. Although not limited to any type or location of a given reservoir, the two articles in this section primarily focus on onshore unconventional reservoirs. Nonetheless, the techniques and approaches will also be of interest to readers and practitioners who deal with conventional reservoirs, both onshore and offshore.

Download Full-text

Importance of Big Data

Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8505-5.ch001 ◽

2015 ◽

pp. 1-19 ◽

Cited By ~ 1

Author(s):

Seema Ansari ◽

Radha Mohanlal ◽

Javier Poncela ◽

Adeel Ansari ◽

Komal Mohanlal

Keyword(s):

Big Data ◽

Heterogeneous Data ◽

Data Sets ◽

Data Types ◽

It Industry ◽

Management Tools ◽

Location Data ◽

Processing Power ◽

Multiple Data ◽

Big Data Applications

Combining vast amounts of heterogeneous data and increasing the processing power of existing database management tools is no doubt the emerging need of IT industry in coming years. The complexity and size of data sets that need to be acquired, analyzed, stored, sorted or transferred has spiked in the recent years. Due to the tremendously increasing volume of multiple data types, creating Big Data applications that can extract the valuable trends and relationships required for further processes or deriving useful results is quite challenging task. Companies, corporate organizations or be it government agencies, all need to analyze and execute Big Data implementation to pave new paths of productivity and innovation. This chapter discusses the emerging technology of modern era: Big Data with detailed description of the three V's (Variety, Velocity and Volume). Further chapters will enable to understand the concepts of data mining and big data analysis, Potentials of Big Data in five domains i.e. Healthcare, Public sector, Retail, Manufacturing and Personal location Data.

Download Full-text

Data Integration for Immunology

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-012420-122454 ◽

2020 ◽

Vol 3 (1) ◽

pp. 113-136

Author(s):

Silvia Pineda ◽

Daniel G. Bunis ◽

Idit Kosti ◽

Marina Sirota

Keyword(s):

Immune System ◽

Next Generation Sequencing ◽

Data Integration ◽

Single Cell ◽

Systems Approach ◽

Data Types ◽

Systems Approaches ◽

Multiple Data ◽

Diverse Data ◽

Generation Sequencing

Over the last several years, next-generation sequencing and its recent push toward single-cell resolution have transformed the landscape of immunology research by revealing novel complexities about all components of the immune system. With the vast amounts of diverse data currently being generated, and with the methods of analyzing and combining diverse data improving as well, integrative systems approaches are becoming more powerful. Previous integrative approaches have combined multiple data types and revealed ways that the immune system, both as a whole and as individual parts, is affected by genetics, the microbiome, and other factors. In this review, we explore the data types that are available for studying immunology with an integrative systems approach, as well as the current strategies and challenges for conducting such analyses.

Download Full-text

Data Integration on Multiple Data Sets

2008 IEEE International Conference on Bioinformatics and Biomedicine ◽

10.1109/bibm.2008.48 ◽

2008 ◽

Cited By ~ 5

Author(s):

Tian Mi ◽

Robert Aseltine ◽

Sanguthevar Rajasekaran

Keyword(s):

Data Integration ◽

Data Sets ◽

Multiple Data ◽

Multiple Data Sets

Download Full-text

Data integration for conservation: Leveraging multiple data types to advance ecological assessments and habitat modeling for marine megavertebrates using OBIS–SEAMAP

Ecological Informatics ◽

10.1016/j.ecoinf.2014.01.003 ◽

2014 ◽

Vol 20 ◽

pp. 13-26 ◽

Cited By ~ 10

Author(s):

Ei Fujioka ◽

Connie Y. Kot ◽

Bryan P. Wallace ◽

Benjamin D. Best ◽

Jerry Moxley ◽

...

Keyword(s):

Data Integration ◽

Habitat Modeling ◽

Data Types ◽

Multiple Data

Download Full-text

Coupled Co-clustering-based Unsupervised Transfer Learning for the Integrative Analysis of Single-Cell Genomic Data

10.1101/2020.03.28.013938 ◽

2020 ◽

Author(s):

Pengcheng Zeng ◽

Jiaxuan WangWu ◽

Zhixiang Lin

Keyword(s):

Single Cell ◽

Transfer Learning ◽

Learning Algorithm ◽

Genomic Data ◽

Integrative Analysis ◽

Data Sets ◽

Clustering Methods ◽

Data Types ◽

Multiple Data ◽

Multiple Data Sets

AbstractUnsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. Most current clustering methods are designed for one data type only, such as scRNA-seq, scATAC-seq or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. Integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. We propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data, and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic data sets. The software and data sets are available at https://github.com/cuhklinlab/coupleCoC.

Download Full-text

Variational autoencoders for cancer data integration: design principles and computational practice

10.1101/719542 ◽

2019 ◽

Cited By ~ 2

Author(s):

Nikola Simidjievski ◽

Cristian Bodnar ◽

Ifrah Tariq ◽

Paul Scherer ◽

Helena Andres-Terre ◽

...

Keyword(s):

Breast Cancer ◽

Clinical Data ◽

Molecular Taxonomy ◽

Patient Data ◽

Data Sets ◽

Data Types ◽

Cancer Data ◽

Learning Framework ◽

Multiple Data ◽

Multiple Data Sets

ABSTRACTInternational initiatives such as the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) are collecting multiple data sets at different genome-scales with the aim to identify novel cancer bio-markers and predict patient survival. To analyse such data, several machine learning, bioinformatics and statistical methods have been applied, among them neural networks such as autoencoders. Although these models provide a good statistical learning framework to analyse multi-omic and/or clinical data, there is a distinct lack of work on how to integrate diverse patient data and identify the optimal design best suited to the available data.In this paper, we investigate several autoencoder architectures that integrate a variety of cancer patient data types (e.g., multi-omics and clinical data). We perform extensive analyses of these approaches and provide a clear methodological and computational framework for designing systems that enable clinicians to investigate cancer traits and translate the results into clinical applications. We demonstrate how these networks can be designed, built and, in particular, applied to tasks of integrative analyses of heterogeneous breast cancer data. The results show that these approaches yield relevant data representations that, in turn, lead to accurate and stable diagnosis.

Download Full-text

MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration

BMC Bioinformatics ◽

10.1186/s12859-016-1455-1 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 13

Author(s):

Carles Hernandez-Ferrer ◽

Carlos Ruiz-Arenas ◽

Alba Beltran-Gomila ◽

Juan R. González

Keyword(s):

Data Integration ◽

R Package ◽

Data Sets ◽

Multiple Data ◽

Multiple Data Sets ◽

Omic Data Integration ◽

Omic Data

Download Full-text

Semi-Supervised Learning Using Hierarchical Mixture Models: Gene Essentiality Case Study

Mathematical and Computational Applications ◽

10.3390/mca26020040 ◽

2021 ◽

Vol 26 (2) ◽

pp. 40

Author(s):

Michael W. Daniels ◽

Daniel Dvorkin ◽

Rani K. Powers ◽

Katerina Kechris

Keyword(s):

Supervised Learning ◽

Mixture Model ◽

Essential Genes ◽

Training Data ◽

Data Sets ◽

Data Types ◽

Gene Essentiality ◽

Multiple Data ◽

Level Data ◽

Accuracy Of Prediction

Integrating gene-level data is useful for predicting the role of genes in biological processes. This problem has typically focused on supervised classification, which requires large training sets of positive and negative examples. However, training data sets that are too small for supervised approaches can still provide valuable information. We describe a hierarchical mixture model that uses limited positively labeled gene training data for semi-supervised learning. We focus on the problem of predicting essential genes, where a gene is required for the survival of an organism under particular conditions. We applied cross-validation and found that the inclusion of positively labeled samples in a semi-supervised learning framework with the hierarchical mixture model improves the detection of essential genes compared to unsupervised, supervised, and other semi-supervised approaches. There was also improved prediction performance when genes are incorrectly assumed to be non-essential. Our comparisons indicate that the incorporation of even small amounts of existing knowledge improves the accuracy of prediction and decreases variability in predictions. Although we focused on gene essentiality, the hierarchical mixture model and semi-supervised framework is standard for problems focused on prediction of genes or other features, with multiple data types characterizing the feature, and a small set of positive labels.

Download Full-text

Heterogeneous Effects of the De Jure and De Facto Business Environment: Findings from Multiple Data Sets on the Business Environment

10.1596/1813-9450-9115 ◽

2020 ◽

Author(s):

Christine Zhenwei Qiang ◽

He Wang ◽

L. Colin Xu

Keyword(s):

Business Environment ◽

Data Sets ◽

Multiple Data ◽

Heterogeneous Effects ◽

Multiple Data Sets

Download Full-text

Multiple data parameter identification for nonlinear conceptual models

Water Science & Technology ◽

10.2166/wst.1997.0165 ◽

1997 ◽

Vol 36 (5) ◽

pp. 61-68 ◽

Cited By ~ 1

Author(s):

Hermann Eberl ◽

Amar Khelil ◽

Peter Wilderer

Keyword(s):

Optimization Problem ◽

Conceptual Models ◽

Reference Data ◽

Transport Model ◽

Calibration Method ◽

Data Sets ◽

Multiple Data ◽

Marquardt Algorithm ◽

Multicriteria Optimization Problem ◽

Higher Order Differential Equations

A numerical method for the identification of parameters of nonlinear higher order differential equations is presented, which is based on the Levenberg-Marquardt algorithm. The estimation of the parameters can be performed by using several reference data sets simultaneously. This leads to a multicriteria optimization problem, which will be treated by using the Pareto optimality concept. In this paper, the emphasis is put on the presentation of the calibration method. As an example identification of the parameters of a nonlinear hydrological transport model for urban runoff is included, but the method can be applied to other problems as well.

Download Full-text