scholarly journals Matched filtering with non-Gaussian noise for planet transit detections

Author(s):  
Jakob Robnik ◽  
Uroš Seljak

Abstract We develop a method for planet detection in transit data, which is based on the Matched Filter technique, combined with the Gaussianization of the noise outliers. The method is based on Fourier transforms and is as fast as the existing methods for planet searches. The Gaussinized Matched Filter (GMF) method significantly outperforms the standard baseline methods in terms of the false positive rate, enabling planet detections at up to 30% lower transit amplitudes. Moreover, the method extracts all the main planet transit parameters, amplitude, period, phase, and duration. By comparison to the state of the art Gaussian Process methods on both simulations and real data we show that all the transit parameters are determined with an optimal accuracy (no bias and minimum variance), meaning that the GMF method can be used both for the initial planet detection and the follow-up planet parameter analysis.

2012 ◽  
Vol 11 ◽  
pp. CIN.S9048 ◽  
Author(s):  
Shuhei Kaneko ◽  
Akihiro Hirakawa ◽  
Chikuma Hamada

Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.


2018 ◽  
Author(s):  
Cox Lwaka Tamba ◽  
Yuan-Ming Zhang

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.


2015 ◽  
Author(s):  
David M Rocke ◽  
Luyao Ruan ◽  
Yilun Zhang ◽  
J. Jared Gossett ◽  
Blythe Durbin-Johnson ◽  
...  

Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dis-persion is high, rather than for low-count genes.


2015 ◽  
Author(s):  
A. Carvajal-Rodríguez

AbstractThe detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60-95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.


2021 ◽  
Vol 2 (4) ◽  
pp. 1209-1224
Author(s):  
Cameron Bertossa ◽  
Peter Hitchcock ◽  
Arthur DeGaetano ◽  
Riwal Plougonven

Abstract. Bimodality and other types of non-Gaussianity arise in ensemble forecasts of the atmosphere as a result of nonlinear spread across ensemble members. In this paper, bimodality in 50-member ECMWF ENS-extended ensemble forecasts is identified and characterized. Forecasts of 2 m temperature are found to exhibit widespread bimodality well over a derived false-positive rate. In some regions bimodality occurs in excess of 30 % of forecasts, with the largest rates occurring during lead times of 2 to 3 weeks. Bimodality occurs more frequently in the winter hemisphere with indications of baroclinicity being a factor to its development. Additionally, bimodality is more common over the ocean, especially the polar oceans, which may indicate development caused by boundary conditions (such as sea ice). Near the equatorial region, bimodality remains common during either season and follows similar patterns to the Intertropical Convergence Zone (ITCZ), suggesting convection as a possible source for its development. Over some continental regions the modes of the forecasts are separated by up to 15 °C. The probability density for the modes can be up to 4 times greater than at the minimum between the modes, which lies near the ensemble mean. The widespread presence of such bimodality has potentially important implications for decision makers acting on these forecasts. Bimodality also has implications for assessing forecast skill and for statistical postprocessing: several commonly used skill-scoring methods and ensemble dressing methods are found to perform poorly in the presence of bimodality, suggesting the need for improvements in how non-Gaussian ensemble forecasts are evaluated.


2021 ◽  
Vol 13 (19) ◽  
pp. 10696
Author(s):  
Netzah Calamaro ◽  
Yuval Beck ◽  
Ran Ben Melech ◽  
Doron Shmilovitz

Energy fraud detection bears significantly on urban ecology. Reduced losses and power consumption would affect carbon dioxide emissions and reduce thermal pollution. Fraud detection also provides another layer of urban socio-economic correlation heatmapping and improves city energy distribution. This paper describes a novel algorithm of energy fraud detection, utilizing energy and energy consumption specialized knowledge poured into AI front-end. The proposed algorithm improves fraud detection’s accuracy and reduces the false positive rate, as well as reducing the preliminary required training dataset. The paper also introduces a holistic algorithm, specifying the major phenomena that disguises as energy fraud or affects it. Consequently, a mathematical foundation for energy fraud detection for the proposed algorithm is presented. The results show that a unique pattern is obtained during fraud, which is independent of a reference non-fraud pattern of the same customer. The theory is implemented on real data taken from smart metering systems and validated in real life scenarios.


2018 ◽  
Author(s):  
◽  
Supriya Balaji Ramachandran

Electrochemical microelectrodes can detect single-vesicle release events as "spikes" of amperometric current. We developed a template based "matched-filter" approach that performs least squares fit of a library of templates to the data and identifies a spike when a detection criterion score given by the ratio of amplitude to the standard error exceeds a minimum threshold. This method outperformed existing approaches and detected >95% of true spikes for a mere 2% false positive rate as evidenced by receiver operating characteristic plots of sensitivity vs specificity. The next step is estimation of spike parameters like peak amplitude (Imax), half-maximal width (t50) and area under the curve (Q) which inform maximal flux, flux duration and charge respectively. Closely successive overlapping spikes are ambiguous to estimate as they may not decay back to baseline and should be rejected. Matched filter approach not only provided robust spike detection but also parameter seed values to reject overlapping spikes and also perform iterative curve fitting of spikes. The remaining well-separated spikes were iteratively fit in two phases, first by fitting rising and decaying phases separately and second by fitting the entire time course using seed values from the matched filter template parameters. Using curve-fit parameters, Imax, t50 and Q were calculated. Histograms of these parameters had bi-modal Gaussian distributions with centers and spreads within 12% and 4% of histograms created using manually analyzed data. The pre-spike baseline was estimated using a novel application of the matched-filter criterion scores and the estimation of pre-spike foot signal parameters such as charge (Qfoot) and duration (tfoot) yielded means, and medians within 10% of manually computed parameters.


Author(s):  
ZIQIANG SHI ◽  
BOYANG GAO ◽  
TIERAN ZHENG ◽  
JIQING HAN

In this paper, a novel method from the feature — porno-sounds recognition — point of view is proposed to detect adult video sequences automatically which may serve as a verification step, a supplementary method or an independent detector. To the specificity of erotic sound, its feature analysis is given. Based on the popular features, histograms and contours are introduced as new sets of features. At the same time due to the complexity of outside data, a general framework called in-class clustering is proposed which selects the most representative subclass for training and classification. All these efforts increase the recall rate and decrease the false positive rate. Experiments on real data from the Internet indicate that the proposed method yields superior performance with 89.17% recall rate and 10.78% false positive rate being achieved.


2002 ◽  
Vol 41 (01) ◽  
pp. 37-41 ◽  
Author(s):  
S. Shung-Shung ◽  
S. Yu-Chien ◽  
Y. Mei-Due ◽  
W. Hwei-Chung ◽  
A. Kao

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.


1993 ◽  
Vol 32 (02) ◽  
pp. 175-179 ◽  
Author(s):  
B. Brambati ◽  
T. Chard ◽  
J. G. Grudzinskas ◽  
M. C. M. Macintosh

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.


Sign in / Sign up

Export Citation Format

Share Document