scholarly journals Testing the mean matrix in high-dimensional transposable data

Biometrics ◽  
2015 ◽  
Vol 71 (1) ◽  
pp. 157-166 ◽  
Author(s):  
Anestis Touloumis ◽  
Simon Tavaré ◽  
John C. Marioni
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Hanji He ◽  
Guangming Deng

We extend the mean empirical likelihood inference for response mean with data missing at random. The empirical likelihood ratio confidence regions are poor when the response is missing at random, especially when the covariate is high-dimensional and the sample size is small. Hence, we develop three bias-corrected mean empirical likelihood approaches to obtain efficient inference for response mean. As to three bias-corrected estimating equations, we get a new set by producing a pairwise-mean dataset. The method can increase the size of the sample for estimation and reduce the impact of the dimensional curse. Consistency and asymptotic normality of the maximum mean empirical likelihood estimators are established. The finite sample performance of the proposed estimators is presented through simulation, and an application to the Boston Housing dataset is shown.


2013 ◽  
Vol 404 ◽  
pp. 548-554
Author(s):  
Yu Sang ◽  
Hong Wen Song ◽  
Jun Zhao

Discretization is a necessary pre-processing step of the mining task, and a way of performance improvement for many machine learning algorithms. Existing techniques mainly focus on 1-dimension discretization in lower dimensional data space. In this paper, we present an intrinsic dimensional correlation discretization technique in high-dimensional data space. The approach estimates the intrinsic dimensionality (ID) of the data by using maximum likelihood estimation (MLE). Further, we project data onto eigenspace of the estimated lower ID by using principle component analysis (PCA) that can discover the potential correlation structure in the multivariate data. Thus, all the dimensions of the data can be transformed into new independent eigenspace of the ID, and each dimension can be discretized separately in the eigenspace based on the promising Bayes discretization model by using outstanding MODL discretization method. We design a heuristic framework to find better discretization scheme. Our approach demonstrates that there is a significantly improvement on the mean learning accuracy of the classifiers than traditional discretization methods.


Biometrika ◽  
2021 ◽  
Author(s):  
Yuqian Zhang ◽  
Jelena Bradic

Abstract A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-n inference of the outcome’s mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root-n. This is achieved by a novel k-fold cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric, or semiparametric models. We apply our methods to the heterogeneous treatment effects.


Sign in / Sign up

Export Citation Format

Share Document