Small Sample Issues for Microarray-Based Classification

Edward R. Dougherty

doi:10.1002/cfg.62

Small Sample Issues for Microarray-Based Classification

Comparative and Functional Genomics ◽

10.1002/cfg.62 ◽

2001 ◽

Vol 2 (1) ◽

pp. 28-34 ◽

Cited By ~ 95

Author(s):

Edward R. Dougherty

Keyword(s):

Microarray Data ◽

Small Sample ◽

Training Data ◽

Small Samples ◽

Classification Rules ◽

Classifier Design ◽

Sample Classification ◽

Sample Data ◽

The Impact ◽

Biological Differences

In order to study the molecular biological differences between normal and diseased tissues, it is desirable to perform classification among diseases and stages of disease using microarray-based gene-expression values. Owing to the limited number of microarrays typically used in these studies, serious issues arise with respect to the design, performance and analysis of classifiers based on microarray data. This paper reviews some fundamental issues facing small-sample classification: classification rules, constrained classifiers, error estimation and feature selection. It discusses both unconstrained and constrained classifier design from sample data, and the contributions to classifier error from constrained optimization and lack of optimality owing to design from sample data. The difficulty with estimating classifier error when confined to small samples is addressed, particularly estimating the error from training data. The impact of small samples on the ability to include more than a few variables as classifier features is explained.

Download Full-text

Underwater Acoustic Target Recognition Based on Generative Adversarial Network Data Augmentation

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2737 ◽

2021 ◽

Vol 263 (2) ◽

pp. 4558-4564

Author(s):

Minghong Zhang ◽

Xinwei Luo

Keyword(s):

Data Augmentation ◽

Target Recognition ◽

Training Data ◽

Small Samples ◽

Generative Adversarial Network ◽

Data Set ◽

Underwater Acoustic ◽

Adversarial Network ◽

Acoustic Target ◽

The Impact

Underwater acoustic target recognition is an important aspect of underwater acoustic research. In recent years, machine learning has been developed continuously, which is widely and effectively applied in underwater acoustic target recognition. In order to acquire good recognition results and reduce the problem of overfitting, Adequate data sets are essential. However, underwater acoustic samples are relatively rare, which has a certain impact on recognition accuracy. In this paper, in addition of the traditional audio data augmentation method, a new method of data augmentation using generative adversarial network is proposed, which uses generator and discriminator to learn the characteristics of underwater acoustic samples, so as to generate reliable underwater acoustic signals to expand the training data set. The expanded data set is input into the deep neural network, and the transfer learning method is applied to further reduce the impact caused by small samples by fixing part of the pre-trained parameters. The experimental results show that the recognition result of this method is better than the general underwater acoustic recognition method, and the effectiveness of this method is verified.

Download Full-text

Characterization of the Effectiveness of Reporting Lists of Small Feature Sets Relative to the Accuracy of the Prior Biological Knowledge

Cancer Informatics ◽

10.4137/cin.s4020 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S4020 ◽

Cited By ~ 4

Author(s):

Chen Zhao ◽

Michael L. Bittner ◽

Robert S. Chapkin ◽

Edward R. Dougherty

Keyword(s):

Small Sample ◽

Training Data ◽

Small Samples ◽

Classification Error ◽

Biological Knowledge ◽

Feature Sets ◽

Good Feature ◽

Prior Biological Knowledge ◽

Selection Algorithms ◽

Power Curves

When confronted with a small sample, feature-selection algorithms often fail to find good feature sets, a problem exacerbated for high-dimensional data and large feature sets. The problem is compounded by the fact that, if one obtains a feature set with a low error estimate, the estimate is unreliable because training-data-based error estimators typically perform poorly on small samples, exhibiting optimistic bias or high variance. One way around the problem is limit the number of features being considered, restrict features sets to sizes such that all feature sets can be examined by exhaustive search, and report a list of the best performing feature sets. If the list is short, then it greatly restricts the possible feature sets to be considered as candidates; however, one can expect the lowest error estimates obtained to be optimistically biased so that there may not be a close-to-optimal feature set on the list. This paper provides a power analysis of this methodology; in particular, it examines the kind of results one should expect to obtain relative to the length of the list and the number of discriminating features among those considered. Two measures are employed. The first is the probability that there is at least one feature set on the list whose true classification error is within some given tolerance of the best feature set and the second is the expected number of feature sets on the list whose true errors are within the given tolerance of the best feature set. These values are plotted as functions of the list length to generate power curves. The results show that, if the number of discriminating features is not too small—that is, the prior biological knowledge is not too poor—then one should expect, with high probability, to find good feature sets. Availability: companion website at http://gsp.tamu.edu/Publications/supplementary/zhao09a/

Download Full-text

Abnormal Detection to Big Data Using Deep Neural Networks

Journal of Physics Conference Series ◽

10.1088/1742-6596/2068/1/012025 ◽

2021 ◽

Vol 2068 (1) ◽

pp. 012025

Author(s):

Jian Zheng ◽

Zhaoni Li ◽

Jiang Li ◽

Hongling Liu

Keyword(s):

Neural Network ◽

Big Data ◽

Risk Model ◽

Small Sample ◽

Small Samples ◽

Small Data ◽

Data Segmentation ◽

Sample Data ◽

Segmentation Algorithms ◽

The Common

Abstract It is difficult to detect the anomalies in big data using traditional methods due to big data has the characteristics of mass and disorder. For the common methods, they divide big data into several small samples, then analyze these divided small samples. However, this manner increases the complexity of segmentation algorithms, moreover, it is difficult to control the risk of data segmentation. To address this, here proposes a neural network approch based on Vapnik risk model. Firstly, the sample data is randomly divided into small data blocks. Then, a neural network learns these divided small sample data blocks. To reduce the risks in the process of data segmentation, the Vapnik risk model is used to supervise data segmentation. Finally, the proposed method is verify on the historical electricity price data of Mountain View, California. The results show that our method is effectiveness.

Download Full-text

Medically-managed Hospital in the Home: 7 year study of mortality and unplanned interruption

Australian Health Review ◽

10.1071/ah09771 ◽

2010 ◽

Vol 34 (3) ◽

pp. 269 ◽

Cited By ~ 24

Author(s):

Michael Montalto ◽

Benjamin Lui ◽

Ann Mullins ◽

Katherine Woodmason

Keyword(s):

Hospital Care ◽

Nursing Home Residents ◽

Low Frequency ◽

Small Sample ◽

Acute Hospital ◽

Operating Conditions ◽

Small Samples ◽

Hospital Services ◽

Acute Hospital Care ◽

The Impact

Background.Hospital in the Home (HIH) research is characterised by small samples in new programs. We sought to examine a large number of consecutive HIH admissions over many years in an established, medically-managed HIH service in to determine whether: (1) HIH is a safe and effective method of delivering acute hospital care, under usual operating conditions in an established unit; and (2) what patient, condition and treatment variables contribute to a greater risk of failure. Method.A survey of all patients admitted to a medically-managed HIH unit from 2000–2007. Results.A total of 3423 admissions to HIH were examined. Of these 2207 (64.5%) were admitted directly into the HIH from Emergency Department or rooms, with the remainder admitted from hospital wards. A total of 26 653 HIH bed days were delivered, with a mean of 9.3 nursing visits and 4.1 medical visits per admission. A total of 143 patients (4.2%) required an interruption via an unplanned return to hospital; 106 (3.1%) did not subsequently return to HIH. The commonest reasons for unplanned returns to hospital were: no clinical improvement; cardiac conditions; fever; breathlessness and pain. Patients over the age of 50, and those receiving intravenous antibiotic therapy, were more likely to require a return to hospital. Two patients died unexpectedly while in HIH, and a further three patients died unexpectedly after their unplanned return to hospital. This is a total unexpected mortality rate of 0.15%. Conclusion.This sample of HIH patients is five times the number of HIH patients ever enrolled in randomised trials studies of this area. Further, outcomes were achieved in ‘ordinary’ working conditions over a long time period. Care was completed without interruption (return to hospital) in 95.8% of all episodes. Interruption was associated with patients referred from inpatient wards, older patients, and patients who were treated with intravenous antibiotics. Patients referred from Emergency Departments experienced fewer interruptions. Nursing home residents were no more likely to require an interruption to their HIH care. What is known about the topic?Hospital in the Home is the delivery of acute hospital services to patients at home. There is no consensus on the best model of HIH. Studies of HIH have small sample sizes, so support for HIH is often qualified. What does this paper add?This paper describes activity and outcomes for 3423 consecutive patients admitted into a medically-managed HIH over 7 years. This represents an extensive long-term survey of HIH patient care outcomes. What are the implications for practitioners?Medically-managed HIH is able to deliver acute hospital care with low rates of unexpected mortality and unplanned returns to hospital. Trials using low frequency events such as mortality and delirium as outcomes will require very large samples, and such large trials are unlikely to occur. The impact of medically-managed HIH on access to acute hospital services for certain diagnostic groups could be significant and deserves further expansion. The concept of hospitalisation can be refined to include HIH.

Download Full-text

Identification of Navel Orange Diseases and Pests Based on the Fusion of DenseNet and Self-Attention Mechanism

Computational Intelligence and Neuroscience ◽

10.1155/2021/5436729 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yin’e Zhang ◽

Yong Ping Liu

Keyword(s):

Small Sample ◽

Small Samples ◽

Data Sets ◽

Navel Orange ◽

Pests And Diseases ◽

Sample Data ◽

Different Types ◽

High Recognition Accuracy ◽

Intelligent Recognition ◽

And Control

The prevention and control of navel orange pests and diseases is an important measure to ensure the yield of navel oranges. Aiming at the problems of slow speed, strong subjectivity, high requirements for professional knowledge required, and high identification costs in the identification methods of navel orange pests and diseases, this paper proposes a method based on DenseNet and attention. The power mechanism fusion (DCPSNET) identification method of navel orange diseases and pests improves the traditional deep dense network DenseNet model to realize accurate and efficient identification of navel orange diseases and pests. Due to the difficulty in collecting data of navel orange pests and diseases, this article uses image enhancement technology to expand. The experimental results show that, in the case of small samples, compared with the traditional model, the DCPSNET model can accurately identify different types of navel orange diseases and pests images and the accuracy of identifying six types of navel orange diseases and pests on the test set is as high as 96.90%. The method proposed in this paper has high recognition accuracy, realizes the intelligent recognition of navel orange diseases and pests, and also provides a way for high-precision recognition of small sample data sets.

Download Full-text

Novel MOA Fault Detection Technology Based on Small Sample Infrared Image

Electronics ◽

10.3390/electronics10151748 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1748

Author(s):

Baoquan Wei ◽

Yong Zuo ◽

Yande Liu ◽

Wei Luo ◽

Kaiyun Wen ◽

...

Keyword(s):

Fault Detection ◽

Recognition Rate ◽

Infrared Image ◽

Detection Algorithm ◽

Small Sample ◽

Training Data ◽

Small Samples ◽

Generative Adversarial Networks ◽

Extended Model ◽

Detection Technology

This paper proposes a novel metal oxide arrester (MOA) fault detection technology based on a small sample infrared image. The research is carried out from the detection process and data enhancement. A lightweight MOA identification and location algorithm is designed at the edge, which can not only reduce the amount of data uploaded, but also reduce the search space of cloud algorithm. In order to improve the accuracy and generalization ability of the defect detection model under the condition of small samples, a multi-model fusion detection algorithm is proposed. Different features of the image are extracted by multiple convolutional neural networks, and then multiple classifiers are trained. Finally, the weighted voting strategy is used for fault diagnosis. In addition, the extended model of fault samples is constructed by transfer learning and deep convolutional generative adversarial networks (DCGAN) to solve the problem of unbalanced training data sets. The experimental results show that the proposed method can realize the accurate location of arrester under the condition of small samples, and after the data expansion, the recognition rate of arrester anomalies can be improved from 83% to 85%, showing high effectiveness and reliability.

Download Full-text

Dangers of the Defaults: A Tutorial on the Impact of Default Priors When Using Bayesian SEM With Small Samples

Frontiers in Psychology ◽

10.3389/fpsyg.2020.611963 ◽

2020 ◽

Vol 11 ◽

Author(s):

Sanne C. Smid ◽

Sonja D. Winter

Keyword(s):

Sample Size ◽

Structural Equation ◽

Structural Equation Models ◽

Small Sample ◽

Small Samples ◽

Informative Priors ◽

Prior Distributions ◽

Shiny App ◽

Technical Discussion ◽

The Impact

When Bayesian estimation is used to analyze Structural Equation Models (SEMs), prior distributions need to be specified for all parameters in the model. Many popular software programs offer default prior distributions, which is helpful for novel users and makes Bayesian SEM accessible for a broad audience. However, when the sample size is small, those prior distributions are not always suitable and can lead to untrustworthy results. In this tutorial, we provide a non-technical discussion of the risks associated with the use of default priors in small sample contexts. We discuss how default priors can unintentionally behave as highly informative priors when samples are small. Also, we demonstrate an online educational Shiny app, in which users can explore the impact of varying prior distributions and sample sizes on model results. We discuss how the Shiny app can be used in teaching; provide a reading list with literature on how to specify suitable prior distributions; and discuss guidelines on how to recognize (mis)behaving priors. It is our hope that this tutorial helps to spread awareness of the importance of specifying suitable priors when Bayesian SEM is used with small samples.

Download Full-text

Small Sample Underwater Target Recognition Based on Mobilenet_ YOLOV4 Algorithm

CONVERTER ◽

10.17762/converter.135 ◽

2021 ◽

pp. 359-372

Author(s):

Jun Zhang, Xiaohong Peng, Zixiang Liang, Rongfa Chen, ZhaoLi

Keyword(s):

Target Recognition ◽

Image Data ◽

Original Data ◽

Histogram Equalization ◽

Small Sample ◽

Small Samples ◽

Backbone Networks ◽

Sample Data ◽

Underwater Target ◽

The Cost

Objectives: Underwater target recognition through simulation robot, or manual acquisition of seabed image data, the cost of sampling is high, the sample data obtained is limited, and the image quality is poor, and the data can be used for training is small. Methods: Aiming at this problem, this paper improves the algorithm based on yoov4, modifies its feature extraction backbone network, and proposes three kinds of YOLOV4 algorithms based on different Mobile net backbone networks to test the underwater target recognition in the case of small samples. In this paper, the real image of the seabed is used as the original data for training, and the data which is different from the training set is used for prediction. Result: Compared with the original YOLOV4 algorithm under the same conditions, the experimental results of MobilenetV1_YOLOV4 algorithm has the best MAP(86.04%) and FPS(52); and the histogram equalization method is used to enhance the image, which can be used as a further supplementary recognition of the missed target, and reduce the missed rate. Conclusions: The algorithm takes into account both lightweight and accuracy, and provides support for underwater target recognition in marine operation development and aquaculture

Download Full-text

An Automatic Evaluation Method for Parkinson's Dyskinesia Using Finger Tapping Video for Small Samples

10.21203/rs.3.rs-1207003/v1 ◽

2022 ◽

Author(s):

Zhu Li ◽

lu kang ◽

Miao Cai ◽

Xiaoli Liu ◽

Yanwen Wang ◽

...

Keyword(s):

Evaluation Method ◽

Assessment Method ◽

Small Sample ◽

Finger Tapping ◽

Small Samples ◽

Automatic Evaluation ◽

Automatic Assessment ◽

Estimation Model ◽

Sample Classification ◽

Optimal Accuracy

Abstract PurposeThe assessment of dyskinesia in Parkinson's disease (PD) based on Artificial Intelligence technology is a significant and challenging task. At present, doctors usually use MDS-UPDRS scale to assess the severity of patients. This method is time-consuming and laborious, and there are subjective differences. The evaluation method based on sensor equipment is also widely used, but this method is expensive and needs professional guidance, which is not suitable for remote evaluation and patient self-examination. In addition, it is difficult to collect patient data in medical research, so it is of great significance to find an objective and automatic assessment method for Parkinson's dyskinesia based on small samples.MethodsIn this study, we design an automatic evaluation method combining manual features and convolutional neural network (CNN), which is suitable for small sample classification. Based on the finger tapping video of Parkinson's patients, we use the pose estimation model to obtain the action skeleton information and calculate the feature data. We then use the 5-folds cross validation training model to achieve optimum trade-of between bias and variance, and finally make multi-class prediction through fully connected network (FCN). ResultsOur proposed method achieves the current optimal accuracy of 79.7% in this research. We have compared with the latest methods of related research, and our method is superior to them in terms of accuracy, number of parameters and FLOPs. ConclusionThe method in this paper does not require patients to wear sensor devices, and has obvious advantages in remote clinical evaluation. At the same time, the method of using motion feature data to train CNN model obtains the optimal accuracy, effectively solves the problem of difficult data acquisition in medicine, and provides a new idea for small sample classification.

Download Full-text

Structural, Functional, and Metabolic Brain Differences as a Function of Gender Identity or Sexual Orientation: A Systematic Review of the Human Neuroimaging Literature

10.20944/preprints202006.0330.v1 ◽

2020 ◽

Author(s):

Alberto Frigerio ◽

Lucia Ballerini ◽

Maria Del C. Valdés Hernández

Keyword(s):

Systematic Review ◽

Sexual Orientation ◽

Gender Identity ◽

Small Sample Size ◽

Human Sexuality ◽

Hormonal Treatment ◽

Small Sample ◽

Small Samples ◽

Opposite Sex ◽

The Impact

Human sexuality is a complex reality, including gender identity and sexual orientation. A widespread approach to study human sexuality is to compare groups with opposite sexual approaches such as cisgenders vs transgenders and heterosexuals vs homosexuals. Neuroimaging studies have found brain differences between these groups of individuals. Nevertheless, they reported conflicting results and limitations such as small samples’ sizes and the considerable overlap between such groups makes it difficult to draw accurate conclusions. This systematic review explored structural, functional and metabolic features of the ‘cisgender brain’ compared with the ‘transgender brain’ before hormonal treatment and the ‘heterosexual brain’ compared to the ‘homosexual brain’ from the analysis of the neuroimaging literature up to 2018. Our main aim is to help identifying biological brain features that have been related to human sexuality to contribute to the understanding of the biological elements involved in gender identity and sexual orientation. Our results suggest that the majority of neuroanatomical, neurophysiological, and neurometabolic features in transgender individuals resemble those of their natal sex rather than those of their experienced gender and in homosexual individuals these resemble those of their same sex heterosexual population rather than their opposite sex heterosexual population. However, it is always difficult to interpret null findings with non-invasive neuroimaging. Given the gross nature of these measures, it is still possible that there are differences that are too subtle to measure with available tools yet have the impact of contributing to gender identity and sexual orientation. Moreover, conflicting results, also contributed to the impossibility of identifying specific brain features which consistently differ between cis- and transgender nor between hetero- and homosexual groups. The small number of studies, the small sample size of each study, and the heterogeneity of the investigations made impossible to meta-analyse all the data extracted. Further studies are necessary to increase the understanding of the neurological substrate of human sexuality.

Download Full-text