scholarly journals HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

2015 ◽  
Author(s):  
A. Carvajal-Rodríguez

AbstractThe detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60-95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.

2015 ◽  
Author(s):  
David M Rocke ◽  
Luyao Ruan ◽  
Yilun Zhang ◽  
J. Jared Gossett ◽  
Blythe Durbin-Johnson ◽  
...  

Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dis-persion is high, rather than for low-count genes.


2014 ◽  
Vol 644-650 ◽  
pp. 3338-3341 ◽  
Author(s):  
Guang Feng Guo

During the 30-year development of the Intrusion Detection System, the problems such as the high false-positive rate have always plagued the users. Therefore, the ontology and context verification based intrusion detection model (OCVIDM) was put forward to connect the description of attack’s signatures and context effectively. The OCVIDM established the knowledge base of the intrusion detection ontology that was regarded as the center of efficient filtering platform of the false alerts to realize the automatic validation of the alarm and self-acting judgment of the real attacks, so as to achieve the goal of filtering the non-relevant positives alerts and reduce false positives.


2020 ◽  
Vol 30 (12) ◽  
pp. 1851-1855
Author(s):  
Sruti Rao ◽  
M. B. Goens ◽  
Orrin B. Myers ◽  
Emilie A. Sebesta

AbstractAim:To determine the false-positive rate of pulse oximetry screening at moderate altitude, presumed to be elevated compared with sea level values and assess change in false-positive rate with time.Methods:We retrospectively analysed 3548 infants in the newborn nursery in Albuquerque, New Mexico, (elevation 5400 ft) from July 2012 to October 2013. Universal pulse oximetry screening guidelines were employed after 24 hours of life but before discharge. Newborn babies between 36 and 36 6/7 weeks of gestation, weighing >2 kg and babies >37 weeks weighing >1.7 kg were included in the study. Log-binomial regression was used to assess change in the probability of false positives over time.Results:Of the 3548 patients analysed, there was one true positive with a posteriorly-malaligned ventricular septal defect and an interrupted aortic arch. Of the 93 false positives, the mean pre- and post-ductal saturations were lower, 92 and 90%, respectively. The false-positive rate before April 2013 was 3.5% and after April 2013, decreased to 1.5%. There was a significant decrease in false-positive rate (p = 0.003, slope coefficient = −0.082, standard error of coefficient = 0.023) with the relative risk of a false positive decreasing at 0.92 (95% CI 0.88–0.97) per month.Conclusion:This is the first study in Albuquerque, New Mexico, reporting a high false-positive rate of 1.5% at moderate altitude at the end of the study in comparison to the false-positive rate of 0.035% at sea level. Implementation of the nationally recommended universal pulse oximetry screening was associated with a high false-positive rate in the initial period, thought to be from the combination of both learning curve and altitude. After the initial decline, it remained steadily elevated above sea level, indicating the dominant effect of moderate altitude.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Ginette Lafit ◽  
Francis Tuerlinckx ◽  
Inez Myin-Germeys ◽  
Eva Ceulemans

AbstractGaussian Graphical Models (GGMs) are extensively used in many research areas, such as genomics, proteomics, neuroimaging, and psychology, to study the partial correlation structure of a set of variables. This structure is visualized by drawing an undirected network, in which the variables constitute the nodes and the partial correlations the edges. In many applications, it makes sense to impose sparsity (i.e., some of the partial correlations are forced to zero) as sparsity is theoretically meaningful and/or because it improves the predictive accuracy of the fitted model. However, as we will show by means of extensive simulations, state-of-the-art estimation approaches for imposing sparsity on GGMs, such as the Graphical lasso, ℓ1 regularized nodewise regression, and joint sparse regression, fall short because they often yield too many false positives (i.e., partial correlations that are not properly set to zero). In this paper we present a new estimation approach that allows to control the false positive rate better. Our approach consists of two steps: First, we estimate an undirected network using one of the three state-of-the-art estimation approaches. Second, we try to detect the false positives, by flagging the partial correlations that are smaller in absolute value than a given threshold, which is determined through cross-validation; the flagged correlations are set to zero. Applying this new approach to the same simulated data, shows that it indeed performs better. We also illustrate our approach by using it to estimate (1) a gene regulatory network for breast cancer data, (2) a symptom network of patients with a diagnosis within the nonaffective psychotic spectrum and (3) a symptom network of patients with PTSD.


2012 ◽  
Vol 11 ◽  
pp. CIN.S9048 ◽  
Author(s):  
Shuhei Kaneko ◽  
Akihiro Hirakawa ◽  
Chikuma Hamada

Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.


2018 ◽  
Author(s):  
Cox Lwaka Tamba ◽  
Yuan-Ming Zhang

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Virpi Tunninen ◽  
Pekka Varjo ◽  
Tomi Kauppinen ◽  
Anu Holm ◽  
Hannu Eskola ◽  
...  

Objectives. This retrospective study evaluated whether the use of additional anterior Tc99m-sestamibi/123I pinhole imaging improves the outcome of Tc99m-sestamibi/123I subtraction SPECT/CT in parathyroid scintigraphy (PS). Materials and Methods. PS using simultaneous dual-isotope subtraction methods and an acquisition protocol combining SPECT/CT and planar pinhole imaging was performed for 175 patients with primary or secondary hyperparathyroidism. All patients who proceeded to surgery with complete postsurgery laboratory findings were included in this study (n=94). SPECT/CT images alone and combined with pinhole images were evaluated. Results. There were 111 enlarged parathyroid glands of which 104 and 108 glands were correctly visualized by SPECT/CT (seven false positives) or SPECT/CT with pinhole (three false positives), respectively. Both sensitivity and specificity were higher with combined SPECT/CT with pinhole than with SPECT/CT alone (97% versus 94% and 99% versus 98%, resp., not significant). The false-positive rate was 6% with SPECT/CT and decreased to 3% using combined SPECT/CT with pinhole. Conclusion. Tc99m-sestamibi/123I subtraction SPECT/CT is a highly sensitive and specific protocol for PS. The use of additional anterior pinhole imaging increases both sensitivity and specificity of PS, although this increase is not statistically significant.


1981 ◽  
Vol 74 (1) ◽  
pp. 41-43 ◽  
Author(s):  
I G Barrison ◽  
E R Littlewood ◽  
J Primavesi ◽  
A Sharpies ◽  
I T Gilmore ◽  
...  

Stools have been tested for occult gastrointestinal bleeding in 278 outpatients and 170 hospital inpatients using the Haemoccult and Haemastix methods. Seventeen outpatients (6.1%) and 42 inpatients (24.7%) were positive with the Haemoccult technique. Thirty-three outpatients (11.9%) and 93 inpatients (54.7%) were positive with the Haemastix test. Following investigation of the Haemoccult-positive patients, only 2 cases (3.4%) were considered false positives. However, the false positive rate with Haemastix was 22.9% which is unacceptable in a screening test. Haemoccult may be useful as a screening test for asymptomatic general practice patients, but a test of greater sensitivity is needed for hospital patients.


2018 ◽  
pp. 1-10
Author(s):  
Luke T. Lavallée ◽  
Rodney H. Breau ◽  
Dean Fergusson ◽  
Cynthia Walsh ◽  
Carl van Walraven

Purpose Administrative health data can be a valuable resource for health research. Because these data are not collected for research purposes, it is imperative that the accuracy of codes used to identify patients, exposures, and outcomes is measured. Patients and Methods Code sensitivity was determined by identifying a cohort of men with histologically confirmed prostate cancer in the Ontario Cancer Registry and linking them to the Ontario Health Insurance Plan (OHIP) to determine whether a prostate biopsy code had been claimed. Code specificity was estimated using a random sample of patients at The Ottawa Hospital for whom a prostate biopsy code was submitted to OHIP. A simulation model, which varied the code false-positive rate, true-negative rate, and proportion of code positives in the population, was created to determine specificity under a range of combinations of these parameters. Results Between 1991 and 2012, 97,369 of 148,669 men with histologically confirmed prostate cancer in the Ontario Cancer Registry had a prostate biopsy code in OHIP within 1 week of their diagnosis (code sensitivity, 86.0%). This increased significantly over time (63.8% in 1991 to 87.9% in 2012). The false-positive rate of the code for index prostate biopsies was 1.9%. The simulation model found that the code specificity exceeded 95% for first prostate biopsy but was lower for secondary biopsies because of more false positives. False positives primarily were related to placement of fiducial markers for patients who received radiotherapy. Conclusion Administrative data in Ontario can accurately identify men who receive a prostate biopsy. The code is less accurate for secondary biopsy procedures and their sequelae.


2014 ◽  
Vol 687-691 ◽  
pp. 2611-2617
Author(s):  
Hong Hai Zhou ◽  
Pei Bin Liu ◽  
Zhi Hao Jin

In this paper, a new method which is named DRNFD for network troubleshooting is brought forward in which “abnormal degree” is defined by the vector of probability and belief functions in a privileged process. A new formula based on Dempster Rule is presented to decrease false positives. This method (DRNFD) can effectively reduce false positive rate and non-response rate and can be applied to real-time fault diagnosis. The operational prototypical system demonstrates its feasibility and gets the effectiveness of real-time fault diagnosis.


Sign in / Sign up

Export Citation Format

Share Document