Outlier Detection Based on Residual Histogram Preference for Geometric Multi-Model Fitting

Xi Zhao; Yun Zhang; Shoulie Xie; Qianqing Qin; Shiqian Wu; Bin Luo

doi:10.3390/s20113037

Outlier Detection Based on Residual Histogram Preference for Geometric Multi-Model Fitting

Sensors ◽

10.3390/s20113037 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3037

Author(s):

Xi Zhao ◽

Yun Zhang ◽

Shoulie Xie ◽

Qianqing Qin ◽

Shiqian Wu ◽

...

Keyword(s):

Outlier Detection ◽

State Of The Art ◽

Geometric Model ◽

Model Fitting ◽

Distribution Model ◽

Data Sets ◽

Analysis Method ◽

Detection Scheme ◽

The Impact ◽

Better Than

Geometric model fitting is a fundamental issue in computer vision, and the fitting accuracy is affected by outliers. In order to eliminate the impact of the outliers, the inlier threshold or scale estimator is usually adopted. However, a single inlier threshold cannot satisfy multiple models in the data, and scale estimators with a certain noise distribution model work poorly in geometric model fitting. It can be observed that the residuals of outliers are big for all true models in the data, which makes the consensus of the outliers. Based on this observation, we propose a preference analysis method based on residual histograms to study the outlier consensus for outlier detection in this paper. We have found that the outlier consensus makes the outliers gather away from the inliers on the designed residual histogram preference space, which is quite convenient to separate outliers from inliers through linkage clustering. After the outliers are detected and removed, a linkage clustering with permutation preference is introduced to segment the inliers. In addition, in order to make the linkage clustering process stable and robust, an alternative sampling and clustering framework is proposed in both the outlier detection and inlier segmentation processes. The experimental results also show that the outlier detection scheme based on residual histogram preference can detect most of the outliers in the data sets, and the fitting results are better than most of the state-of-the-art methods in geometric multi-model fitting.

Download Full-text

The Three-Cornered Hat Method for Estimating Error Variances of Three or More Atmospheric Data Sets – Part II: Evaluating Radio Occultation and Radiosonde Observations, Global Model Forecasts, and Reanalyses

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-20-0209.1 ◽

2021 ◽

Author(s):

Therese Rieckh ◽

Jeremiah P. Sjoberg ◽

Richard A. Anthes

Keyword(s):

Radio Occultation ◽

State Of The Art ◽

Specific Humidity ◽

Data Sets ◽

Error Growth ◽

Atmospheric Conditions ◽

Error Statistics ◽

Weather And Climate ◽

Atmospheric Data ◽

The Impact

AbstractWe apply the three-cornered hat (3CH) method to estimate refractivity, bending angle, and specific humidity error variances for a number of data sets widely used in research and/or operations: radiosondes, radio occultation (COSMIC, COSMIC-2), NCEP global forecasts, and nine reanalyses. We use a large number and combinations of data sets to obtain insights into the impact of the error correlations among different data sets that affect 3CH estimates. Error correlations may be caused by actual correlations of errors, representativeness differences, or imperfect co-location of the data sets. We show that the 3CH method discriminates among the data sets and how error statistics of observations compare to state-of-the-art reanalyses and forecasts, as well as reanalyses that do not assimilate satellite data. We explore results for October and November 2006 and 2019 over different latitudinal regions and show error growth of the NCEP forecasts with time. Because of the importance of tropospheric water vapor to weather and climate, we compare error estimates of refractivity for dry and moist atmospheric conditions.

Download Full-text

Estimating probability of banking crises using random forest

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp407-413 ◽

2021 ◽

Vol 10 (2) ◽

pp. 407

Author(s):

Sri Hartini ◽

Zuherman Rustam ◽

Glori Stephani Saragih ◽

María Jesús Segovia Vargas

Keyword(s):

Random Forest ◽

State Of The Art ◽

Banking Crises ◽

Training Data ◽

Annual Data ◽

Systemic Crisis ◽

Classification And Regression ◽

Systemic Crises ◽

The Impact ◽

Better Than

<span id="docs-internal-guid-4935b5ce-7fff-d9fa-75c7-0c6a5aa1f9a6"><span>Banks have a crucial role in the financial system. When many banks suffer from the crisis, it can lead to financial instability. According to the impact of the crises, the banking crisis can be divided into two categories, namely systemic and non-systemic crisis. When systemic crises happen, it may cause even stable banks bankrupt. Hence, this paper proposed a random forest for estimating the probability of banking crises as prevention action. Random forest is well-known as a robust technique both in classification and regression, which is far from the intervention of outliers and overfitting. The experiments were then constructed using the financial crisis database, containing a sample of 79 countries in the period 1981-1999 (annual data). This dataset has 521 samples consisting of 164 crisis samples and 357 non-crisis cases. From the experiments, it was concluded that utilizing 90 percent of training data would deliver 0.98 accuracy, 0.92 sensitivity, 1.00 precision, and 0.96 F1-Score as the highest score than other percentages of training data. These results are also better than state-of-the-art methods used in the same dataset. Therefore, the proposed method is shown promising results to predict the probability of banking crises.</span></span>

Download Full-text

barry and the BAO model comparison

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa361 ◽

2020 ◽

Vol 493 (3) ◽

pp. 4078-4093 ◽

Cited By ~ 5

Author(s):

Samuel R Hinton ◽

Cullan Howlett ◽

Tamara M Davis

Keyword(s):

Model Comparison ◽

State Of The Art ◽

Model Fitting ◽

Power Spectra ◽

Acoustic Oscillation ◽

Effective Field ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

New Public ◽

Sky Survey

ABSTRACT We compare the performance of four state-of-the-art models for extracting isotropic measurements of the baryon acoustic oscillation (BAO) scale. To do this, we created a new, public, modular code barry, which contains data sets, model fitting tools, and model implementations incorporating different descriptions of non-linear physics and algorithms for isolating the BAO feature. These are then evaluated for bias, correlation, and fitting strength using mock power spectra and correlation functions developed for the Sloan Digital Sky Survey Data Release 12. Our main findings are as follows: (1) all of the models can recover unbiased constraints when fit to the pre- and post-reconstruction simulations. (2) Models that provide physical descriptions of the damping of the BAO feature (using e.g. standard perturbation or effective-field theory arguments) report smaller errors on average, although the distribution of mock χ2 values indicates these are underestimated. (3) Allowing the BAO damping scale to vary can provide tighter constraints for some mocks, but is an artificial improvement that only arises when noise randomly sharpens the BAO peak. (4) Unlike recent claims in the literature when utilizing a BAO Extractor technique, we find no improvement in the accuracy of the recovered BAO scale. (5) We implement a procedure for combining all models into a single consensus result that improves over the standard method without obviously underestimating the uncertainties. Overall, barry provides a framework for performing the cosmological analyses for upcoming surveys, and for rapidly testing and validating new models.

Download Full-text

FAST-ODT: A Lightweight Outlier Detection Scheme for Categorical Data Sets

IEEE Transactions on Network Science and Engineering ◽

10.1109/tnse.2020.3022869 ◽

2020 ◽

pp. 1-1

Author(s):

Hongwei Du ◽

Qiang Ye ◽

Zhipeng Sun ◽

Chuang Liu ◽

Wen Xu

Keyword(s):

Outlier Detection ◽

Categorical Data ◽

Data Sets ◽

Detection Scheme

Download Full-text

An Approach to Determine Important Attributes for Engineering Change Evaluation

Journal of Mechanical Design ◽

10.1115/1.4023551 ◽

2013 ◽

Vol 135 (4) ◽

Cited By ~ 7

Author(s):

Chandresh Mehta ◽

Lalit Patil ◽

Debasish Dutta

Keyword(s):

Optimization Problem ◽

State Of The Art ◽

Evaluation Criteria ◽

Engineering Change ◽

Knowledge Based ◽

Detailed Evaluation ◽

Change Impact ◽

Search Approach ◽

The Impact ◽

Better Than

Enterprises plan detailed evaluation of only those engineering change (EC) effects that might have a significant impact. Using past EC knowledge can prove effective in determining whether a proposed EC effect has significant impact. In order to utilize past EC knowledge, it is essential to identify important attributes that should be compared to compute similarity between ECs. This paper presents a knowledge-based approach for determining important EC attributes that should be compared to retrieve similar past ECs so that the impact of proposed EC effect can be evaluated. The problem of determining important EC attributes is formulated as the multi-objective optimization problem. Measures are defined to quantify importance of an attribute set. The knowledge in change database and the domain rules among attribute values are combined for computing the measures. An ant colony optimization (ACO)-based search approach is used for efficiently locating the set of important attributes. An example EC knowledge-base is created and used for evaluating the measures and the overall approach. The evaluation results show that our measures perform better than state-of-the-art evaluation criteria. Our overall approach is evaluated based on manual observations. The results show that our approach correctly evaluates the value of proposed change impact with a success rate of 83.33%.

Download Full-text

Kernel-phase analysis: Aperture modeling prescriptions that minimize calibration errors

Astronomy and Astrophysics ◽

10.1051/0004-6361/201936981 ◽

2020 ◽

Vol 636 ◽

pp. A72

Author(s):

Frantz Martinache ◽

Alban Ceau ◽

Romain Laugier ◽

Jens Kammerer ◽

Mamadou N’Diaye ◽

...

Keyword(s):

Data Analysis ◽

Projection Operator ◽

Detection Limits ◽

Systematic Errors ◽

Transmission Model ◽

Data Sets ◽

High Contrast ◽

Analysis Method ◽

Contrast Detection ◽

The Impact

Context. Kernel phase is a data analysis method based on a generalization of the notion of closure phase, which was invented in the context of interferometry, but it applies to well corrected diffraction dominated images produced by an arbitrary aperture. The linear model upon which it relies theoretically leads to the formation of observable quantities robust against residual aberrations. Aims. In practice, the detection limits that have been reported thus far seem to be dominated by systematic errors induced by calibration biases that were not sufficiently filtered out by the kernel projection operator. This paper focuses on the impact the initial modeling of the aperture has on these errors and introduces a strategy to mitigate them, using a more accurate aperture transmission model. Methods. The paper first uses idealized monochromatic simulations of a nontrivial aperture to illustrate the impact modeling choices have on calibration errors. It then applies the outlined prescription to two distinct data sets of images whose analysis has previously been published. Results. The use of a transmission model to describe the aperture results is a significant improvement over the previous type of analysis. The thus reprocessed data sets generally lead to more accurate results, which are less affected by systematic errors. Conclusions. As kernel-phase observing programs are becoming more ambitious, accuracy in the aperture description is becoming paramount to avoid situations where contrast detection limits are dominated by systematic errors. The prescriptions outlined in this paper will benefit from any attempt at exploiting kernel phase for high-contrast detection.

Download Full-text

Constructing a Lightweight Key-Value Store Based on the Windows Native Features

Applied Sciences ◽

10.3390/app9183801 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3801 ◽

Cited By ~ 1

Author(s):

Hyuk-Yoon Kwon

Keyword(s):

State Of The Art ◽

Main Idea ◽

Real Data ◽

Data Sets ◽

Parameter Setting ◽

Data Set ◽

Multi Level ◽

Windows Registry ◽

Best Parameter ◽

Better Than

In this paper, we propose a method to construct a lightweight key-value store based on the Windows native features. The main idea is providing a thin wrapper for the key-value store on top of a built-in storage in Windows, called Windows registry. First, we define a mapping of the components in the key-value store onto the components in the Windows registry. Then, we present a hash-based multi-level registry index so as to distribute the key-value data balanced and to efficiently access them. Third, we implement basic operations of the key-value store (i.e., Get, Put, and Delete) by manipulating the Windows registry using the Windows native APIs. We call the proposed key-value store WR-Store. Finally, we propose an efficient ETL (Extract-Transform-Load) method to migrate data stored in WR-Store into any other environments that support existing key-value stores. Because the performance of the Windows registry has not been studied much, we perform the empirical study to understand the characteristics of WR-Store, and then, tune the performance of WR-Store to find the best parameter setting. Through extensive experiments using synthetic and real data sets, we show that the performance of WR-Store is comparable to or even better than the state-of-the-art systems (i.e., RocksDB, BerkeleyDB, and LevelDB). Especially, we show the scalability of WR-Store. That is, WR-Store becomes much more efficient than the other key-value stores as the size of data set increases. In addition, we show that the performance of WR-Store is maintained even in the case of intensive registry workloads where 1000 processes accessing to the registry actively are concurrently running.

Download Full-text

Hypergraph Optimization for Multi-Structural Geometric Model Fitting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018730 ◽

2019 ◽

Vol 33 ◽

pp. 8730-8737

Author(s):

Shuyuan Lin ◽

Guobao Xiao ◽

Yan Yan ◽

David Suter ◽

Hanzi Wang

Keyword(s):

Spectral Clustering ◽

Input Data ◽

State Of The Art ◽

Geometric Model ◽

Model Fitting ◽

Synthetic Data ◽

Estimation Algorithm ◽

Sampling Efficiency ◽

Data Points ◽

Fitting In

Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually contaminated with noises and outliers), which will significantly increase the computational burden. In order to overcome the above problem, we propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. Specifically, HOMF includes two main parts: an adaptive inlier estimation algorithm for vertex optimization and an iterative hyperedge optimization algorithm for hyperedge optimization. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations. Moreover, HOMF can then directly apply spectral clustering, to achieve good fitting performance. Extensive experimental results show that HOMF outperforms several state-of-the-art model fitting methods on both synthetic data and real images, especially in sampling efficiency and in handling data with severe outliers.

Download Full-text

An Input-aware Factorization Machine for Sparse Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/203 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yantao Yu ◽

Zhen Wang ◽

Bo Yuan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real World ◽

State Of The Art ◽

Overall Performance ◽

Factorization Machine ◽

The Impact ◽

Novel Model ◽

Individual Input ◽

Better Than

Factorization machines (FMs) are a class of general predictors working effectively with sparse data, which represents features using factorized parameters and weights. However, the accuracy of FMs can be adversely affected by the fixed representation trained for each feature, as the same feature is usually not equally predictive and useful in different instances. In fact, the inaccurate representation of features may even introduce noise and degrade the overall performance. In this work, we improve FMs by explicitly considering the impact of individual input upon the representation of features. We propose a novel model named \textit{Input-aware Factorization Machine} (IFM), which learns a unique input-aware factor for the same feature in different instances via a neural network. Comprehensive experiments on three real-world recommendation datasets are used to demonstrate the effectiveness and mechanism of IFM. Empirical results indicate that IFM is significantly better than the standard FM model and consistently outperforms four state-of-the-art deep learning based methods.

Download Full-text

Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500037 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550003 ◽

Cited By ~ 1

Author(s):

Armin Daneshpazhouh ◽

Ashkan Sami

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Fuzzy Clustering ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Real World Applications

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.

Download Full-text