Matched filtering with non-Gaussian noise for planet transit detections

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stab1178 ◽

2021 ◽

Author(s):

Jakob Robnik ◽

Uroš Seljak

Keyword(s):

Fourier Transforms ◽

Matched Filter ◽

False Positive Rate ◽

Real Data ◽

Minimum Variance ◽

Parameter Analysis ◽

Planet Detection ◽

Positive Rate ◽

Optimal Accuracy ◽

Non Gaussian

Abstract We develop a method for planet detection in transit data, which is based on the Matched Filter technique, combined with the Gaussianization of the noise outliers. The method is based on Fourier transforms and is as fast as the existing methods for planet searches. The Gaussinized Matched Filter (GMF) method significantly outperforms the standard baseline methods in terms of the false positive rate, enabling planet detections at up to 30% lower transit amplitudes. Moreover, the method extracts all the main planet transit parameters, amplitude, period, phase, and duration. By comparison to the state of the art Gaussian Process methods on both simulations and real data we show that all the transit parameters are determined with an optimal accuracy (no bias and minimum variance), meaning that the GMF method can be used both for the initial planet detection and the follow-up planet parameter analysis.

Download Full-text

Gene Selection using a High-Dimensional Regression Model with Microarrays in Cancer Prognostic Studies

Cancer Informatics ◽

10.4137/cin.s9048 ◽

2012 ◽

Vol 11 ◽

pp. CIN.S9048 ◽

Cited By ~ 4

Author(s):

Shuhei Kaneko ◽

Akihiro Hirakawa ◽

Chikuma Hamada

Keyword(s):

False Positive ◽

Cross Validation ◽

Gene Selection ◽

Cox Model ◽

False Positive Rate ◽

Real Data ◽

Tuning Parameter ◽

High Dimensional ◽

Positive Rate ◽

Selection Operator

Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.

Download Full-text

A fast mrMLM algorithm for multi-locus genome-wide association studies

10.1101/341784 ◽

2018 ◽

Cited By ~ 23

Author(s):

Cox Lwaka Tamba ◽

Yuan-Ming Zhang

Keyword(s):

False Positive ◽

Statistical Power ◽

Association Studies ◽

False Positive Rate ◽

Real Data ◽

High Accuracy ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.

Download Full-text

Excess False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data

10.1101/020784 ◽

2015 ◽

Cited By ~ 7

Author(s):

David M Rocke ◽

Luyao Ruan ◽

Yilun Zhang ◽

J. Jared Gossett ◽

Blythe Durbin-Johnson ◽

...

Keyword(s):

Linear Model ◽

False Positive ◽

Negative Binomial ◽

False Positive Rate ◽

Real Data ◽

False Positives ◽

P Value ◽

Data Sets ◽

Rna Seq ◽

Positive Rate

Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dis-persion is high, rather than for low-count genes.

Download Full-text

HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

10.1101/026369 ◽

2015 ◽

Author(s):

A. Carvajal-Rodríguez

Keyword(s):

False Positive Rate ◽

Real Data ◽

Bimodal Distribution ◽

False Positives ◽

Divergent Selection ◽

Outlier Test ◽

Positive Rate ◽

Genomic Regions ◽

Detection Strategies ◽

Haplotype Information

AbstractThe detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60-95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.

Download Full-text

Bimodality in ensemble forecasts of 2 m temperature: identification

Weather and Climate Dynamics ◽

10.5194/wcd-2-1209-2021 ◽

2021 ◽

Vol 2 (4) ◽

pp. 1209-1224

Author(s):

Cameron Bertossa ◽

Peter Hitchcock ◽

Arthur DeGaetano ◽

Riwal Plougonven

Keyword(s):

False Positive Rate ◽

Equatorial Region ◽

Decision Makers ◽

Lead Times ◽

Scoring Methods ◽

Ensemble Forecasts ◽

Ensemble Mean ◽

Positive Rate ◽

Non Gaussian ◽

Winter Hemisphere

Abstract. Bimodality and other types of non-Gaussianity arise in ensemble forecasts of the atmosphere as a result of nonlinear spread across ensemble members. In this paper, bimodality in 50-member ECMWF ENS-extended ensemble forecasts is identified and characterized. Forecasts of 2 m temperature are found to exhibit widespread bimodality well over a derived false-positive rate. In some regions bimodality occurs in excess of 30 % of forecasts, with the largest rates occurring during lead times of 2 to 3 weeks. Bimodality occurs more frequently in the winter hemisphere with indications of baroclinicity being a factor to its development. Additionally, bimodality is more common over the ocean, especially the polar oceans, which may indicate development caused by boundary conditions (such as sea ice). Near the equatorial region, bimodality remains common during either season and follows similar patterns to the Intertropical Convergence Zone (ITCZ), suggesting convection as a possible source for its development. Over some continental regions the modes of the forecasts are separated by up to 15 °C. The probability density for the modes can be up to 4 times greater than at the minimum between the modes, which lies near the ensemble mean. The widespread presence of such bimodality has potentially important implications for decision makers acting on these forecasts. Bimodality also has implications for assessing forecast skill and for statistical postprocessing: several commonly used skill-scoring methods and ensemble dressing methods are found to perform poorly in the presence of bimodality, suggesting the need for improvements in how non-Gaussian ensemble forecasts are evaluated.

Download Full-text

An Energy-Fraud Detection-System Capable of Distinguishing Frauds from Other Energy Flow Anomalies in an Urban Environment

Sustainability ◽

10.3390/su131910696 ◽

2021 ◽

Vol 13 (19) ◽

pp. 10696

Author(s):

Netzah Calamaro ◽

Yuval Beck ◽

Ran Ben Melech ◽

Doron Shmilovitz

Keyword(s):

Energy Flow ◽

Carbon Dioxide Emissions ◽

Detection System ◽

False Positive Rate ◽

Real Life ◽

Fraud Detection ◽

Real Data ◽

Training Dataset ◽

Specialized Knowledge ◽

Positive Rate

Energy fraud detection bears significantly on urban ecology. Reduced losses and power consumption would affect carbon dioxide emissions and reduce thermal pollution. Fraud detection also provides another layer of urban socio-economic correlation heatmapping and improves city energy distribution. This paper describes a novel algorithm of energy fraud detection, utilizing energy and energy consumption specialized knowledge poured into AI front-end. The proposed algorithm improves fraud detection’s accuracy and reduces the false positive rate, as well as reducing the preliminary required training dataset. The paper also introduces a holistic algorithm, specifying the major phenomena that disguises as energy fraud or affects it. Consequently, a mathematical foundation for energy fraud detection for the proposed algorithm is presented. The results show that a unique pattern is obtained during fraud, which is independent of a reference non-fraud pattern of the same customer. The theory is implemented on real data taken from smart metering systems and validated in real life scenarios.

Download Full-text

Automated detection of amperometric spikes resulting from quantal exocytosis and estimation of spike and pre-spike foot signal parameters

10.32469/10355/68885 ◽

2018 ◽

Author(s):

◽

Supriya Balaji Ramachandran

Keyword(s):

Time Course ◽

Matched Filter ◽

False Positive Rate ◽

Area Under The Curve ◽

Signal Parameters ◽

Curve Fit ◽

Least Squares Fit ◽

Positive Rate ◽

Filter Approach ◽

Two Phases

Electrochemical microelectrodes can detect single-vesicle release events as "spikes" of amperometric current. We developed a template based "matched-filter" approach that performs least squares fit of a library of templates to the data and identifies a spike when a detection criterion score given by the ratio of amplitude to the standard error exceeds a minimum threshold. This method outperformed existing approaches and detected >95% of true spikes for a mere 2% false positive rate as evidenced by receiver operating characteristic plots of sensitivity vs specificity. The next step is estimation of spike parameters like peak amplitude (Imax), half-maximal width (t50) and area under the curve (Q) which inform maximal flux, flux duration and charge respectively. Closely successive overlapping spikes are ambiguous to estimate as they may not decay back to baseline and should be rejected. Matched filter approach not only provided robust spike detection but also parameter seed values to reject overlapping spikes and also perform iterative curve fitting of spikes. The remaining well-separated spikes were iteratively fit in two phases, first by fitting rising and decaying phases separately and second by fitting the entire time course using seed values from the matched filter template parameters. Using curve-fit parameters, Imax, t50 and Q were calculated. Histograms of these parameters had bi-modal Gaussian distributions with centers and spreads within 12% and 4% of histograms created using manually analyzed data. The pre-spike baseline was estimated using a novel application of the matched-filter criterion scores and the estimation of pre-spike foot signal parameters such as charge (Qfoot) and duration (tfoot) yielded means, and medians within 10% of manually computed parameters.

Download Full-text

STUDY ON THE RECOGNITION OF OBJECTIONABLE AUDIO

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001410008238 ◽

2010 ◽

Vol 24 (06) ◽

pp. 981-994 ◽

Cited By ~ 1

Author(s):

ZIQIANG SHI ◽

BOYANG GAO ◽

TIERAN ZHENG ◽

JIQING HAN

Keyword(s):

False Positive ◽

False Positive Rate ◽

Real Data ◽

Recall Rate ◽

Point Of View ◽

Superior Performance ◽

Video Sequences ◽

Supplementary Method ◽

Positive Rate ◽

Novel Method

In this paper, a novel method from the feature — porno-sounds recognition — point of view is proposed to detect adult video sequences automatically which may serve as a verification step, a supplementary method or an independent detector. To the specificity of erotic sound, its feature analysis is given. Based on the popular features, histograms and contours are introduced as new sets of features. At the same time due to the complexity of outside data, a general framework called in-class clustering is proposed which selects the most representative subclass for training and classification. All these efforts increase the recall rate and decrease the false positive rate. Experiments on real data from the Internet indicate that the proposed method yields superior performance with 89.17% recall rate and 10.78% false positive rate being achieved.

Download Full-text

Improving diagnosis of acute appendicitis with atypical findings by Tc-99m HMPAO leukocyte scan

Nuklearmedizin ◽

10.1055/s-0038-1623994 ◽

2002 ◽

Vol 41 (01) ◽

pp. 37-41 ◽

Cited By ~ 3

Author(s):

S. Shung-Shung ◽

S. Yu-Chien ◽

Y. Mei-Due ◽

W. Hwei-Chung ◽

A. Kao

Keyword(s):

Acute Appendicitis ◽

False Positive ◽

False Positive Rate ◽

Accurate Method ◽

Clinical Findings ◽

Pathological Findings ◽

Lower Quadrant ◽

Predictive Values ◽

Positive Rate

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.

Download Full-text

Predicting Fetal Chromosome Anomalies in the First Trimester Using Pregnancy Associated Plasma Protein-A: A Comparison of Statistical Methods

Methods of Information in Medicine ◽

10.1055/s-0038-1634910 ◽

1993 ◽

Vol 32 (02) ◽

pp. 175-179 ◽

Cited By ~ 7

Author(s):

B. Brambati ◽

T. Chard ◽

J. G. Grudzinskas ◽

M. C. M. Macintosh

Keyword(s):

Logistic Regression ◽

General Population ◽

Likelihood Ratio ◽

False Positive ◽

False Positive Rate ◽

Ratio Method ◽

Detection Rates ◽

Gaussian Distributions ◽

Positive Rate ◽

Likelihood Ratio Method

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.

Download Full-text