Testing the mean matrix in high-dimensional transposable data

Anestis Touloumis; Simon Tavaré; John C. Marioni

doi:10.1111/biom.12257

Mean Empirical Likelihood Inference for Response Mean with Data Missing at Random

Discrete Dynamics in Nature and Society ◽

10.1155/2020/8893594 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Hanji He ◽

Guangming Deng

Keyword(s):

Empirical Likelihood ◽

Missing At Random ◽

Confidence Regions ◽

Likelihood Inference ◽

High Dimensional ◽

Finite Sample ◽

The Mean ◽

Data Missing ◽

Consistency And Asymptotic Normality ◽

The Impact

We extend the mean empirical likelihood inference for response mean with data missing at random. The empirical likelihood ratio confidence regions are poor when the response is missing at random, especially when the covariate is high-dimensional and the sample size is small. Hence, we develop three bias-corrected mean empirical likelihood approaches to obtain efficient inference for response mean. As to three bias-corrected estimating equations, we get a new set by producing a pairwise-mean dataset. The method can increase the size of the sample for estimation and reduce the impact of the dimensional curse. Consistency and asymptotic normality of the maximum mean empirical likelihood estimators are established. The finite sample performance of the proposed estimators is presented through simulation, and an application to the Boston Housing dataset is shown.

Download Full-text

Ridge-type linear shrinkage estimation of the mean matrix of a high-dimensional normal distribution

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2020.104608 ◽

2020 ◽

Vol 178 ◽

pp. 104608

Author(s):

Ryota Yuasa ◽

Tatsuya Kubokawa

Keyword(s):

Normal Distribution ◽

Linear Shrinkage ◽

High Dimensional ◽

Shrinkage Estimation ◽

The Mean

Download Full-text

A high-dimensional two-sample test for the mean using random subspaces

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2013.12.003 ◽

2014 ◽

Vol 74 ◽

pp. 26-38 ◽

Cited By ~ 23

Author(s):

Måns Thulin

Keyword(s):

High Dimensional ◽

Sample Test ◽

The Mean ◽

Random Subspaces

Download Full-text

Simultaneous testing of the mean vector and covariance matrix among k populations for high-dimensional data

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2019.1639751 ◽

2019 ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Masashi Hyodo ◽

Takahiro Nishiyama

Keyword(s):

Covariance Matrix ◽

High Dimensional Data ◽

High Dimensional ◽

Simultaneous Testing ◽

The Mean ◽

Mean Vector

Download Full-text

Estimating the mean and variance of a high-dimensional normal distribution using a mixture prior

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2019.04.006 ◽

2019 ◽

Vol 138 ◽

pp. 201-221

Author(s):

Shyamalendu Sinha ◽

Jeffrey D. Hart

Keyword(s):

Normal Distribution ◽

High Dimensional ◽

Mixture Prior ◽

Mean And Variance ◽

The Mean

Download Full-text

Inference for the mean of large $p$ small $n$ data: A finite-sample high-dimensional generalization of Hotelling’s theorem

Electronic Journal of Statistics ◽

10.1214/13-ejs833 ◽

2013 ◽

Vol 7 (0) ◽

pp. 2005-2031 ◽

Cited By ~ 8

Author(s):

Piercesare Secchi ◽

Aymeric Stamm ◽

Simone Vantini

Keyword(s):

High Dimensional ◽

Finite Sample ◽

Large P Small N ◽

The Mean ◽

Small N

Download Full-text

Robust estimation of the mean vector for high-dimensional data set using robust clustering

Journal of Applied Statistics ◽

10.1080/02664763.2014.999030 ◽

2015 ◽

Vol 42 (6) ◽

pp. 1183-1205 ◽

Cited By ~ 1

Author(s):

Hamid Shahriari ◽

Orod Ahmadi

Keyword(s):

Robust Estimation ◽

High Dimensional Data ◽

High Dimensional ◽

Data Set ◽

Robust Clustering ◽

The Mean ◽

Mean Vector

Download Full-text

Intrinsic Dimensional Correlation Discretization for Mining Task

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.404.548 ◽

2013 ◽

Vol 404 ◽

pp. 548-554

Author(s):

Yu Sang ◽

Hong Wen Song ◽

Jun Zhao

Keyword(s):

Performance Improvement ◽

Likelihood Estimation ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Discretization Methods ◽

Data Space ◽

Processing Step ◽

Discretization Technique ◽

The Mean ◽

Project Data

Discretization is a necessary pre-processing step of the mining task, and a way of performance improvement for many machine learning algorithms. Existing techniques mainly focus on 1-dimension discretization in lower dimensional data space. In this paper, we present an intrinsic dimensional correlation discretization technique in high-dimensional data space. The approach estimates the intrinsic dimensionality (ID) of the data by using maximum likelihood estimation (MLE). Further, we project data onto eigenspace of the estimated lower ID by using principle component analysis (PCA) that can discover the potential correlation structure in the multivariate data. Thus, all the dimensions of the data can be transformed into new independent eigenspace of the ID, and each dimension can be discretized separately in the eigenspace based on the promising Bayes discretization model by using outstanding MODL discretization method. We design a heuristic framework to find better discretization scheme. Our approach demonstrates that there is a significantly improvement on the mean learning accuracy of the classifiers than traditional discretization methods.

Download Full-text

A simultaneous testing of the mean vector and the covariance matrix among two populations for high-dimensional data

Test ◽

10.1007/s11749-017-0567-x ◽

2017 ◽

Vol 27 (3) ◽

pp. 680-699 ◽

Cited By ~ 1

Author(s):

Masashi Hyodo ◽

Takahiro Nishiyama

Keyword(s):

Covariance Matrix ◽

High Dimensional Data ◽

High Dimensional ◽

Simultaneous Testing ◽

The Mean ◽

Mean Vector ◽

Two Populations

Download Full-text

High-dimensional semi-supervised learning: in search of optimal inference of the mean

Biometrika ◽

10.1093/biomet/asab042 ◽

2021 ◽

Author(s):

Yuqian Zhang ◽

Jelena Bradic

Keyword(s):

Supervised Learning ◽

Semiparametric Models ◽

High Dimensional ◽

Robust Estimator ◽

Consistent Estimation ◽

Heterogeneous Treatment Effects ◽

Double Robust ◽

Fundamental Challenge ◽

The Mean ◽

Linear And Nonlinear

Abstract A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-n inference of the outcome’s mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root-n. This is achieved by a novel k-fold cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric, or semiparametric models. We apply our methods to the heterogeneous treatment effects.

Download Full-text