-Plot for Testing Spherical Symmetry for High-Dimensional Data with a Small Sample Size

Journal of Probability and Statistics ◽

10.1155/2012/728565 ◽

2012 ◽

Vol 2012 ◽

pp. 1-18

Author(s):

Jiajuan Liang

Keyword(s):

Sample Size ◽

Spherical Symmetry ◽

Graphical Method ◽

Small Sample Size ◽

High Dimensional Data ◽

Monte Carlo Study ◽

Real Data ◽

Small Sample ◽

High Dimensional ◽

Data Set

High-dimensional data with a small sample size, such as microarray data and image data, are commonly encountered in some practical problems for which many variables have to be measured but it is too costly or time consuming to repeat the measurements for many times. Analysis of this kind of data poses a great challenge for statisticians. In this paper, we develop a new graphical method for testing spherical symmetry that is especially suitable for high-dimensional data with small sample size. The new graphical method associated with the local acceptance regions can provide a quick visual perception on the assumption of spherical symmetry. The performance of the new graphical method is demonstrated by a Monte Carlo study and illustrated by a real data set.

Download Full-text

Risk of Selection of Irrelevant Features from High-Dimensional Data with Small Sample Size

Springer Proceedings in Mathematics & Statistics - Stochastic Models, Statistics and Their Applications ◽

10.1007/978-3-319-13881-7_44 ◽

2015 ◽

pp. 399-405

Author(s):

Henryk Maciejewski

Keyword(s):

Sample Size ◽

Small Sample Size ◽

High Dimensional Data ◽

Small Sample ◽

High Dimensional ◽

Selection Of

Download Full-text

An Efficient Dimensionality Reduction Approach for Small-sample Size and High-dimensional Data Modeling

Journal of Computers ◽

10.4304/jcp.9.3.576-580 ◽

2014 ◽

Vol 9 (3) ◽

Cited By ~ 5

Author(s):

Xintao Qiu ◽

Dongmei Fu ◽

Zhenduo Fu

Keyword(s):

Dimensionality Reduction ◽

Sample Size ◽

Small Sample Size ◽

High Dimensional Data ◽

Data Modeling ◽

Small Sample ◽

High Dimensional ◽

Reduction Approach

Download Full-text

An Ensemble Classification Method for High-Dimensional Data Using Neighborhood Rough Set

Complexity ◽

10.1155/2021/8358921 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Jing Zhang ◽

Guang Lu ◽

Jiaquan Li ◽

Chuanwen Li

Keyword(s):

Feature Selection ◽

Rough Set ◽

Small Sample Size ◽

High Dimensional Data ◽

Classification Performance ◽

Small Sample ◽

Ensemble Classification ◽

High Dimensional ◽

Sample Classification ◽

Neighborhood Rough Set

Mining useful knowledge from high-dimensional data is a hot research topic. Efficient and effective sample classification and feature selection are challenging tasks due to high dimensionality and small sample size of microarray data. Feature selection is necessary in the process of constructing the model to reduce time and space consumption. Therefore, a feature selection model based on prior knowledge and rough set is proposed. Pathway knowledge is used to select feature subsets, and rough set based on intersection neighborhood is then used to select important feature in each subset, since it can select features without redundancy and deals with numerical features directly. In order to improve the diversity among base classifiers and the efficiency of classification, it is necessary to select part of base classifiers. Classifiers are grouped into several clusters by k-means clustering using the proposed combination distance of Kappa-based diversity and accuracy. The base classifier with the best classification performance in each cluster will be selected to generate the final ensemble model. Experimental results on three Arabidopsis thaliana stress response datasets showed that the proposed method achieved better classification performance than existing ensemble models.

Download Full-text

Filtering high-dimensional methylation marks with extremely small sample size: an application to gastric cancer data

10.21203/rs.3.rs-284773/v1 ◽

2021 ◽

Author(s):

Xin Chen ◽

Qingrun Zhang ◽

Thierry Chekouo

Keyword(s):

Gastric Cancer ◽

Dna Methylation ◽

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Differential Methylation ◽

High Dimensional ◽

Cancer Data ◽

Cancer Pathogenesis ◽

A Genome

Abstract Background: DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. Methods: BACkPAy is a pre-screening Bayesian approach to detect biological meaningful clusters of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e. non-differential) with flat methylation pattern levels accross experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with 3 tissue types and each type contains 3 gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Results: Using BACkPAy, we identified 8 biological meaningful clusters/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e. predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. Conclusions: We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1 and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.

Download Full-text

A Multi-Linear Statistical Method for Discriminant Analysis of 2D Frontal Face Images

Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-61350-429-1.ch002 ◽

2012 ◽

pp. 18-33 ◽

Cited By ~ 1

Author(s):

Carlos Eduardo Thomaz ◽

Vagner do Amaral ◽

Gilson Antonio Giraldi ◽

Edson Caoru Kitani ◽

João Ricardo Sato ◽

...

Keyword(s):

Small Sample Size ◽

Face Image ◽

Small Sample ◽

High Dimensional ◽

Data Set ◽

Linear Discriminant ◽

Face Images ◽

Linear Framework ◽

Facial Changes ◽

2D Data

This chapter describes a multi-linear discriminant method of constructing and quantifying statistically significant changes on human identity photographs. The approach is based on a general multivariate two-stage linear framework that addresses the small sample size problem in high-dimensional spaces. Starting with a 2D data set of frontal face images, the authors determine a most characteristic direction of change by organizing the data according to the patterns of interest. These experiments on publicly available face image sets show that the multi-linear approach does produce visually plausible results for gender, facial expression and aging facial changes in a simple and efficient way. The authors believe that such approach could be widely applied for modeling and reconstruction in face recognition and possibly in identifying subjects after a lapse of time.

Download Full-text

Classifier for chinese traditional medicine with high-dimensional and small sample-size data

Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No.02EX527) ◽

10.1109/wcica.2002.1022123 ◽

2003 ◽

Author(s):

Zhang Lixin ◽

Zhao Yannan ◽

Yang Zehong ◽

Wang Jiaxin ◽

Cai Shaoqing ◽

...

Keyword(s):

Sample Size ◽

Traditional Medicine ◽

Small Sample Size ◽

Small Sample ◽

High Dimensional ◽

Chinese Traditional Medicine ◽

Size Data

Download Full-text

Small Sample Size in High Dimensional Space - Minimum Distance Based Classification

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-07173-2_52 ◽

2014 ◽

pp. 610-621 ◽

Cited By ~ 1

Author(s):

Ewa Skubalska-Rafajłowicz

Keyword(s):

Sample Size ◽

Minimum Distance ◽

Dimensional Space ◽

Small Sample Size ◽

Small Sample ◽

High Dimensional ◽

High Dimensional Space

Download Full-text

Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size

Mathematical Problems in Engineering ◽

10.1155/2014/572173 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Zhihua Wang ◽

Yongbo Zhang ◽

Huimin Fu

Keyword(s):

Time Series ◽

Sample Size ◽

Small Sample Size ◽

Computational Effort ◽

Small Sample ◽

Grey Theory ◽

Data Set ◽

Rolling Mechanism ◽

Short Term Forecasting ◽

Prediction Approach

Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead prediction, a novel autoregressive (AR) prediction approach with rolling mechanism is proposed. In the modeling procedure, a new developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile, the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort. The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied to several practical data sets, including multiple building settlement sequences and two economic series.

Download Full-text

Non-equidistant GM(1,1) model based on GCHM_WBO and its application in corrosion rate prediction

Grey Systems Theory and Application ◽

10.1108/gs-09-2015-0061 ◽

2016 ◽

Vol 6 (3) ◽

pp. 365-374 ◽

Cited By ~ 2

Author(s):

Yuanjie Zhi ◽

Dongmei Fu ◽

Hanling Wang

Keyword(s):

Corrosion Rate ◽

Sample Size ◽

Prediction Accuracy ◽

Small Sample Size ◽

Real Data ◽

Small Sample ◽

Remaining Useful Life ◽

Content Type ◽

Corrosion Data

Purpose The purpose of this paper is to present a new model which combines the non-equidistant GM(1,1) model with GCHM_WBO (generalized contra-harmonic mean (GCHM); weakening buffer operator (WBO)). The authors use the model to solve the deadlock that for a large number of non-equidistant corrosion rate, it is difficult to establish a reasonable prediction model and improve the prediction accuracy. Design/methodology/approach This research consists of three parts: non-equidistant GM(1,1) model, GCHM_WBO operator, and the optimization of morphing parameter (contained in GCHM, control the intensity of the weakening operator). The methodology is explained as follows. First, the authors built a non-equidistant GM(1,1) model with GCHM_WBO weakened data, of which morphing parameter was randomly selected. Next, the authors calculated the error between prediction data of model and the real data, and adjusted the morphing parameter according to the error and property of GCHM. Then, the authors generated a new non-equidistant GM(1,1) based on new morphing parameter, and repeated the previous step until the termination condition was satisfied. Finally, the model with appropriate morphing parameter was used to implement the prediction of new data. Findings This paper finds a property of GCHM, which is a monotonic increasing function of morphing parameter in some specific conditions. Based on the property and the fixed point axiom of WBO, an algorithm was designed to search an appropriate morphing parameter. The appropriate morphing parameter was implemented for the purpose of improving the accuracy of the model. The model was applied to predict the corrosion rate of six steels at Guangzhou experimental station. The results showed that the proposed method can get more accuracy in prediction capability compared to the models with the original data and AWBO weakened data. The method is applicable to long-term forecasts in case of data scarcity. Practical implications Corrosion will cause huge economic loss to a country; therefore, it is important to judge the remaining useful life of a material or equipment; the foundation for judgement of which is the prediction of material corrosion rate. However, the prediction of corrosion rate is very difficult because of corrosion data’s features, such as small sample size, non-equidistant, etc. The proposed method can be used to implement long-term forecast of corrosion data with only one sample and non-equidistant samples. Originality/value This paper presented a model which combines the non-equidistant GM(1,1) model with GCHM_WBO to handle the problem of long-term forecasting of corrosion data. In the modelling process, the proposed morphing parameter searched through algorithm can improve the prediction accuracy of the model. Therefore, the model can provide effective and reliable result when data are of a small sample size and non-equidistant.

Download Full-text

Filtering High-Dimensional Methylation Marks With Extremely Small Sample Size: An Application to Gastric Cancer Data

Frontiers in Genetics ◽

10.3389/fgene.2021.705708 ◽

2021 ◽

Vol 12 ◽

Author(s):

Xin Chen ◽

Qingrun Zhang ◽

Thierry Chekouo

Keyword(s):

Gastric Cancer ◽

Dna Methylation ◽

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Differential Methylation ◽

High Dimensional ◽

Cancer Data ◽

Cancer Pathogenesis ◽

A Genome

DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. BACkPAy is a pre-screening Bayesian approach to detect biological meaningful patterns of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e., non-differential) with flat methylation pattern levels across experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with three tissue types and each type contains three gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Using BACkPAy, we identified eight biological meaningful patterns/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e., predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1, and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.

Download Full-text