An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets

2005 ◽  
Vol 37 (12) ◽  
pp. 1320-1322 ◽  
Author(s):  
Eleftheria Zeggini ◽  
William Rayner ◽  
Andrew P Morris ◽  
Andrew T Hattersley ◽  
Mark Walker ◽  
...  
2014 ◽  
Vol 51 (2) ◽  
pp. 171-179
Author(s):  
Ewa Skotarczak ◽  
Ewa Bakinowska ◽  
Kamila Tomaszyk

Abstract A nonlinear statistical approach was used to evaluate the efficiency of plant protection products. The methodology presented can be implemented when the observations in an experiment are recorded as success or failure. This occurs, for example, when following the application of a herbicide or pesticide, a single weed or insect is classified as alive (failure) or dead (success). Then a higher probability of success means a higher efficiency of the tested product. Using simulated data sets, a comparison was made of three methods based on the logit, probit and threshold models, with special attention to the effect of sample size and number of replications on the accuracy of the estimation of probabilities.


Author(s):  
Jeremiah P. Sjoberg ◽  
Richard A. Anthes ◽  
Therese Rieckh

AbstractThe three-cornered hat (3CH) method, which was originally developed to assess the random errors of atomic clocks, is a means for estimating the error variances of three different data sets. Here we give an overview of the historical development of the 3CH and select other methods for estimating error variances that use either two or three data sets. We discuss similarities and differences between these methods and the 3CH method.This study assesses the sensitivity of the 3CH method to the factors that limit its accuracy, including sample size, outliers, different magnitudes of errors between the data sets, biases, and unknown error correlations. Using simulated data sets for which the errors and their correlations among the data sets are known, this analysis shows the conditions under which the 3CH method provides the most and least accurate estimates. The effect of representativeness errors caused by differences in vertical resolution of data sets is investigated. These representativeness errors are generally small relative to the magnitude of the random errors in the data sets, and the impact of this source of errors can be reduced by appropriate filtering.


2018 ◽  
Vol 11 (7) ◽  
pp. 4309-4325 ◽  
Author(s):  
Therese Rieckh ◽  
Richard Anthes

Abstract. In this paper we compare two different methods of estimating the error variances of two or more independent data sets. One method, called the “three-cornered hat” (3CH) method, requires three data sets. Another method, which we call the “two-cornered hat” (2CH) method, requires only two data sets. Both methods have been used in previous studies to estimate the error variances associated with a number of physical and geophysical data sets. A key assumption in both methods is that the errors of the data sets are not correlated, although some studies have considered the effect of the partial correlation of representativeness errors in two or more of the data sets. We compare the 3CH and 2CH methods using a simple model to simulate three and two data sets with various error correlations and biases. With this model, we know the exact error variances and covariances, which we use to assess the accuracy of the 3CH and 2CH estimates. We examine the sensitivity of the estimated error variances to the degree of error correlation between two of the data sets as well as the sample size. We find that the 3CH method is less sensitive to these factors than the 2CH method and hence is more accurate. We also find that biases in one of the data sets has a minimal effect on the 3CH method, but can produce large errors in the 2CH method.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 949
Author(s):  
Jiangyi Wang ◽  
Min Liu ◽  
Xinwu Zeng ◽  
Xiaoqiang Hua

Convolutional neural networks have powerful performances in many visual tasks because of their hierarchical structures and powerful feature extraction capabilities. SPD (symmetric positive definition) matrix is paid attention to in visual classification, because it has excellent ability to learn proper statistical representation and distinguish samples with different information. In this paper, a deep neural network signal detection method based on spectral convolution features is proposed. In this method, local features extracted from convolutional neural network are used to construct the SPD matrix, and a deep learning algorithm for the SPD matrix is used to detect target signals. Feature maps extracted by two kinds of convolutional neural network models are applied in this study. Based on this method, signal detection has become a binary classification problem of signals in samples. In order to prove the availability and superiority of this method, simulated and semi-physical simulated data sets are used. The results show that, under low SCR (signal-to-clutter ratio), compared with the spectral signal detection method based on the deep neural network, this method can obtain a gain of 0.5–2 dB on simulated data sets and semi-physical simulated data sets.


2018 ◽  
Author(s):  
Michael Nute ◽  
Ehsan Saleh ◽  
Tandy Warnow

AbstractThe estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks. There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations. multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology


2015 ◽  
Vol 11 (A29A) ◽  
pp. 205-207
Author(s):  
Philip C. Gregory

AbstractA new apodized Keplerian model is proposed for the analysis of precision radial velocity (RV) data to model both planetary and stellar activity (SA) induced RV signals. A symmetrical Gaussian apodization function with unknown width and center can distinguish planetary signals from SA signals on the basis of the width of the apodization function. The general model for m apodized Keplerian signals also includes a linear regression term between RV and the stellar activity diagnostic In (R'hk), as well as an extra Gaussian noise term with unknown standard deviation. The model parameters are explored using a Bayesian fusion MCMC code. A differential version of the Generalized Lomb-Scargle periodogram provides an additional way of distinguishing SA signals and helps guide the choice of new periods. Sample results are reported for a recent international RV blind challenge which included multiple state of the art simulated data sets supported by a variety of stellar activity diagnostics.


Sign in / Sign up

Export Citation Format

Share Document