An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets

Eleftheria Zeggini; William Rayner; Andrew P Morris; Andrew T Hattersley; Mark Walker; Graham A Hitman; Panos Deloukas; Lon R Cardon; Mark I McCarthy

doi:10.1038/ng1670

An evaluation of the efficiency of plant protection products via nonlinear statistical methods – a simulation study

Biometrical Letters ◽

10.2478/bile-2014-0012 ◽

2014 ◽

Vol 51 (2) ◽

pp. 171-179

Author(s):

Ewa Skotarczak ◽

Ewa Bakinowska ◽

Kamila Tomaszyk

Keyword(s):

Sample Size ◽

Simulation Study ◽

Statistical Approach ◽

Plant Protection ◽

Simulated Data ◽

Plant Protection Products ◽

Data Sets ◽

Threshold Models ◽

Probability Of Success ◽

Simulated Data Sets

Abstract A nonlinear statistical approach was used to evaluate the efficiency of plant protection products. The methodology presented can be implemented when the observations in an experiment are recorded as success or failure. This occurs, for example, when following the application of a herbicide or pesticide, a single weed or insect is classified as alive (failure) or dead (success). Then a higher probability of success means a higher efficiency of the tested product. Using simulated data sets, a comparison was made of three methods based on the logit, probit and threshold models, with special attention to the effect of sample size and number of replications on the accuracy of the estimation of probabilities.

Download Full-text

The Three-Cornered Hat Method for Estimating Error Variances of Three or More Atmospheric Data Sets – Part I: Overview and Evaluation

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-19-0217.1 ◽

2020 ◽

Author(s):

Jeremiah P. Sjoberg ◽

Richard A. Anthes ◽

Therese Rieckh

Keyword(s):

Sample Size ◽

Historical Development ◽

Simulated Data ◽

Data Sets ◽

Vertical Resolution ◽

Random Errors ◽

Atmospheric Data ◽

Similarities And Differences ◽

The Impact ◽

Simulated Data Sets

AbstractThe three-cornered hat (3CH) method, which was originally developed to assess the random errors of atomic clocks, is a means for estimating the error variances of three different data sets. Here we give an overview of the historical development of the 3CH and select other methods for estimating error variances that use either two or three data sets. We discuss similarities and differences between these methods and the 3CH method.This study assesses the sensitivity of the 3CH method to the factors that limit its accuracy, including sample size, outliers, different magnitudes of errors between the data sets, biases, and unknown error correlations. Using simulated data sets for which the errors and their correlations among the data sets are known, this analysis shows the conditions under which the 3CH method provides the most and least accurate estimates. The effect of representativeness errors caused by differences in vertical resolution of data sets is investigated. These representativeness errors are generally small relative to the magnitude of the random errors in the data sets, and the impact of this source of errors can be reduced by appropriate filtering.

Download Full-text

Evaluating two methods of estimating error variances using simulated data sets with known errors

Atmospheric Measurement Techniques ◽

10.5194/amt-11-4309-2018 ◽

2018 ◽

Vol 11 (7) ◽

pp. 4309-4325 ◽

Cited By ~ 2

Author(s):

Therese Rieckh ◽

Richard Anthes

Keyword(s):

Simple Model ◽

Sample Size ◽

Partial Correlation ◽

Simulated Data ◽

Geophysical Data ◽

Data Sets ◽

Independent Data ◽

Minimal Effect ◽

Estimated Error ◽

Simulated Data Sets

Abstract. In this paper we compare two different methods of estimating the error variances of two or more independent data sets. One method, called the “three-cornered hat” (3CH) method, requires three data sets. Another method, which we call the “two-cornered hat” (2CH) method, requires only two data sets. Both methods have been used in previous studies to estimate the error variances associated with a number of physical and geophysical data sets. A key assumption in both methods is that the errors of the data sets are not correlated, although some studies have considered the effect of the partial correlation of representativeness errors in two or more of the data sets. We compare the 3CH and 2CH methods using a simple model to simulate three and two data sets with various error correlations and biases. With this model, we know the exact error variances and covariances, which we use to assess the accuracy of the 3CH and 2CH estimates. We examine the sensitivity of the estimated error variances to the degree of error correlation between two of the data sets as well as the sample size. We find that the 3CH method is less sensitive to these factors than the 2CH method and hence is more accurate. We also find that biases in one of the data sets has a minimal effect on the 3CH method, but can produce large errors in the 2CH method.

Download Full-text

Spectral Convolution Feature-Based SPD Matrix Representation for Signal Detection Using a Deep Neural Network

Entropy ◽

10.3390/e22090949 ◽

2020 ◽

Vol 22 (9) ◽

pp. 949

Author(s):

Jiangyi Wang ◽

Min Liu ◽

Xinwu Zeng ◽

Xiaoqiang Hua

Keyword(s):

Neural Network ◽

Signal Detection ◽

Convolutional Neural Network ◽

Deep Neural Network ◽

Detection Method ◽

Learning Algorithm ◽

Simulated Data ◽

Data Sets ◽

Feature Maps ◽

Simulated Data Sets

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.

Download Full-text

Benchmarking Statistical Multiple Sequence Alignment

10.1101/304659 ◽

2018 ◽

Cited By ~ 1

Author(s):

Michael Nute ◽

Ehsan Saleh ◽

Tandy Warnow

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structural Alignment ◽

Estimation Method ◽

Simulated Data ◽

Protein Sequences ◽

Data Sets ◽

Sequence Alignments ◽

Multiple Sequence ◽

Simulated Data Sets

AbstractThe estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks. There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations. multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology

Download Full-text

Bayesian Planet Searches for the 10 cm/s Radial Velocity Era

Proceedings of the International Astronomical Union ◽

10.1017/s1743921316002817 ◽

2015 ◽

Vol 11 (A29A) ◽

pp. 205-207

Author(s):

Philip C. Gregory

Keyword(s):

Radial Velocity ◽

State Of The Art ◽

Simulated Data ◽

Model Parameters ◽

Data Sets ◽

Stellar Activity ◽

Bayesian Fusion ◽

Multiple State ◽

Simulated Data Sets ◽

Apodization Function

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.

Download Full-text

A comparison of procedures for classifying remotely-sensed data using simulated data sets incorporating autocorrelations between spectral responses

International Journal of Remote Sensing ◽

10.1080/01431169208904073 ◽

1992 ◽

Vol 13 (14) ◽

pp. 2701-2725 ◽

Cited By ~ 3

Author(s):

J. D. WILSON

Keyword(s):

Simulated Data ◽

Remotely Sensed ◽

Data Sets ◽

Remotely Sensed Data ◽

Simulated Data Sets ◽

Spectral Responses

Download Full-text

Erratum to: The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets

Mathematical Geosciences ◽

10.1007/s11004-011-9323-z ◽

2011 ◽

Vol 43 (3) ◽

pp. 399-399 ◽

Cited By ~ 1

Author(s):

P. Harris ◽

A. S. Fotheringham ◽

R. Crespo ◽

M. Charlton

Keyword(s):

Geographically Weighted Regression ◽

Simulated Data ◽

Spatial Prediction ◽

Weighted Regression ◽

Data Sets ◽

Simulated Data Sets

Download Full-text

The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets

Mathematical Geosciences ◽

10.1007/s11004-010-9284-7 ◽

2010 ◽

Vol 42 (6) ◽

pp. 657-680 ◽

Cited By ~ 89

Author(s):

P. Harris ◽

A. S. Fotheringham ◽

R. Crespo ◽

M. Charlton

Keyword(s):

Geographically Weighted Regression ◽

Simulated Data ◽

Spatial Prediction ◽

Weighted Regression ◽

Data Sets ◽

Simulated Data Sets

Download Full-text

Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the COGA and simulated data sets for Genetic Analysis Workshop 14: Presentation Groups 1, 2, and 3

Genetic Epidemiology ◽

10.1002/gepi.20106 ◽

2005 ◽

Vol 29 (S1) ◽

pp. S7-S28 ◽

Cited By ~ 22

Author(s):

Marsha A. Wilcox ◽

Elizabeth W. Pugh ◽

Heping Zhang ◽

Xiaoyun Zhong ◽

Douglas F. Levinson ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Linkage Analysis ◽

Genetic Analysis ◽

Microsatellite Markers ◽

Genetic Analysis Workshop ◽

Simulated Data ◽

Data Sets ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Simulated Data Sets

Download Full-text