scholarly journals Sparse functional data analysis accounts for missing information in single-cell epigenomics

2018 ◽  
Author(s):  
Pedro Madrigal ◽  
Xiongtao Dai ◽  
Pantelis Z. Hadjipantelis

Single-cell epigenome assays produce sparsely sampled data, leading to coverage pooling across cells to increase resolution. Imputation of missing data using deep learning is available but requires intensive computation, and it has been applied only to DNA methylation obtained by single cell bisulfite sequencing. Here, sparsity in chromatin accessibility obtained by scNMT-seq is addressed using functional data analysis to fit sparsely sampled GpC coverage profiles of individual cells taking into account all the cells of the same cell-type or condition. For that, sparse functional principal component analysis (S-FPCA) is applied, and the principal components are used to estimate chromatin accessibility coverage in individual cells. This methodology can potentially be used with other single-cell assays with missing data such as scBS-seq, scNOME-seq, or scATAC-seq. The R package fdapace is available in CRAN, and R code used in this manuscript can be found at: http://github.com/pmb59/sparseSingleCell.

2018 ◽  
Vol 8 (10) ◽  
pp. 1766 ◽  
Author(s):  
Arthur Leroy ◽  
Andy MARC ◽  
Olivier DUPAS ◽  
Jean Lionel REY ◽  
Servane Gey

Many data collected in sport science come from time dependent phenomenon. This article focuses on Functional Data Analysis (FDA), which study longitudinal data by modelling them as continuous functions. After a brief review of several FDA methods, some useful practical tools such as Functional Principal Component Analysis (FPCA) or functional clustering algorithms are presented and compared on simulated data. Finally, the problem of the detection of promising young swimmers is addressed through a curve clustering procedure on a real data set of performance progression curves. This study reveals that the fastest improvement of young swimmers generally appears before 16 years old. Moreover, several patterns of improvement are identified and the functional clustering procedure provides a useful detection tool.


This handbook presents the state-of-the-art of the statistics dealing with functional data analysis. With contributions from international experts in the field, it discusses a wide range of the most important statistical topics (classification, inference, factor-based analysis, regression modeling, resampling methods, time series, random processes) while also taking into account practical, methodological, and theoretical aspects of the problems. The book is organised into three sections. Part I deals with regression modeling and covers various statistical methods for functional data such as linear/nonparametric functional regression, varying coefficient models, and linear/nonparametric functional processes (i.e. functional time series). Part II considers related benchmark methods/tools for functional data analysis, including curve registration methods for preprocessing functional data, functional principal component analysis, and resampling/bootstrap methods. Finally, Part III examines some of the fundamental mathematical aspects of the infinite-dimensional setting, with a focus on the stochastic background and operatorial statistics: vector-valued function integration, spectral and random measures linked to stationary processes, operator geometry, vector integration and stochastic integration in Banach spaces, and operatorial statistics linked to quantum statistics.


2021 ◽  
Vol 28 (3) ◽  
Author(s):  
Christian Capezza ◽  
Fabio Centofanti ◽  
Antonio Lepore ◽  
Biagio Palumbo

Abstract Sensing networks provide nowadays massive amounts of data that in many applications provide information about curves, surfaces and vary over a continuum, usually time, and thus, can be suitably modelled as functional data. Their proper modelling by means of functional data analysis approaches naturally addresses new challenges also arising in the statistical process monitoring (SPM). Motivated by an industrial application, the objective of the present paper is to provide the reader with a very transparent set of steps for the SPM of functional data in real-world case studies: i) identifying a finite dimensional model for the functional data, based on functional principal component analysis; ii) estimating the unknown parameters; iii) designing control charts on the estimated parameters, in a nonparametric framework. The proposed SPM procedure is applied to a real-case study from the maritime field in monitoring CO2 emissions from real navigation data of a roll-on/roll-off passenger cruise ship, i.e., a ship designed to carry both passengers and wheeled vehicles that are driven on and off the ship on their own wheels. We show different scenarios highlighting clear and interpretable indications that can be extracted from the data set and support the detection of anomalous voyages.


2013 ◽  
Vol 10 (04) ◽  
pp. 1350033 ◽  
Author(s):  
JACOPO ALEOTTI ◽  
STEFANO CASELLI

This paper investigates the use of functional principal component analysis (FPCA) for automatic recognition of dynamic human arm gestures and robot imitation. FPCA is a statistical technique of functional data analysis that generalizes standard multivariate principal component analysis. Functional data analysis signals (e.g., gestures) are functions that are considered as observations of a random variable on a functional space. In particular, FPCA reduces the dimensionality of the input data by projecting them onto a finite-dimensional space spanned by a few prominent eigenfunctions. The main contribution of this work is the proposal of a novel technique for unsupervised clustering of training data and dynamic gesture recognition based on FPCA. FPCA has not been considered in previous studies on humanoid learning. The proposed approach has been evaluated in two experimental settings for motion capture. In the first setup single arm gestures are recognized from inertial sensors attached to the arm of the user. In the second setup the method is extended to two-arm gestures acquired from a range sensor. Recognized gestures are reproduced by a small humanoid robot. The FPCA method has also been compared to a high performance algorithm for gesture classification based on dynamic time warping (DTW). The FPCA algorithm achieves comparable results in both recognition rate and robustness to missing data, while it outperforms DTW in terms of efficiency in execution time.


Author(s):  
Pedro M. Esperança ◽  
Dari F. Da ◽  
Ben Lambert ◽  
Roch K. Dabiré ◽  
Thomas S. Churcher

AbstractNear infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.


2016 ◽  
Vol 49 (2) ◽  
pp. 594-605 ◽  
Author(s):  
Thejas Gopal Krishne Urs ◽  
Karthik Bharath ◽  
Sangappa Yallappa ◽  
Somashekar Rudrappa

This article presents a novel method, based on functional data analysis, to analyse measurements of structural parameters of polymers and polymer composites. The method is demonstrated using newly developed biodegradable conducting polymer composites prepared via a solution casting technique. The measurements of the macro- and microstructural parameters that are used in the characterization of these films are obtained using X-ray diffraction, an impedance analyser and a UV–vis spectrometer. A functional representation of the measured values of the parameters at different dopant concentrations is adopted by viewing them as realizations of a continuous-time stochastic process observed with measurement error. This allows one to estimate the mean functional relationship between a parameter and the dopant concentration. A functional version of principal component analysis is performed, by which the major modes of variation are discovered and the correlations of parameter values at different concentrations are estimated. This provides insight into local and global features of the relationship between these parameters. Some comments are made on how the parameters vary as a function of dopant concentration.


2019 ◽  
Author(s):  
Kyungmin Ahn ◽  
Hironobu Fujiwara

AbstractBackgroundIn single-cell RNA-sequencing (scRNA-seq) data analysis, a number of statistical tools in multivariate data analysis (MDA) have been developed to help analyze the gene expression data. This MDA approach is typically focused on examining discrete genomic units of genes that ignores the dependency between the data components. In this paper, we propose a functional data analysis (FDA) approach on scRNA-seq data whereby we consider each cell as a single function. To avoid a large number of dropouts (zero or zero-closed values) and reduce the high dimensionality of the data, we first perform a principal component analysis (PCA) and assign PCs to be the amplitude of the function. Then we use the index of PCs directly from PCA for the phase components. This approach allows us to apply FDA clustering methods to scRNA-seq data analysis.ResultsTo demonstrate the robustness of our method, we apply several existing FDA clustering algorithms to the gene expression data to improve the accuracy of the classification of the cell types against the conventional clustering methods in MDA. As a result, the FDA clustering algorithms achieve superior accuracy on simulated data as well as real data such as human and mouse scRNA-seq data.ConclusionsThis new statistical technique enhances the classification performance and ultimately improves the understanding of stochastic biological processes. This new framework provides an essentially different scRNA-seq data analytical approach, which can complement conventional MDA methods. It can be truly effective when current MDA methods cannot detect or uncover the hidden functional nature of the gene expression dynamics.


Sign in / Sign up

Export Citation Format

Share Document