A Comparison of Autometrics and Penalization Techniques under Various Error Distributions: Evidence from Monte Carlo Simulation

Complexity ◽

10.1155/2021/9223763 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Faridoon Khan ◽

Amena Urooj ◽

Kalim Ullah ◽

Badr Alnssyan ◽

Zahra Almaspoor

Keyword(s):

Absolute Error ◽

Training Data ◽

Absolute Deviation ◽

Shrinkage Methods ◽

Forecasting Performance ◽

Testing Data ◽

Error Distributions ◽

Penalization Techniques ◽

Improved Performance ◽

Minimax Concave Penalty

This work compares Autometrics with dual penalization techniques such as minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) under asymmetric error distributions such as exponential, gamma, and Frechet with varying sample sizes as well as predictors. Comprehensive simulations, based on a wide variety of scenarios, reveal that the methods considered show improved performance for increased sample size. In the case of low multicollinearity, these methods show good performance in terms of potency, but in gauge, shrinkage methods collapse, and higher gauge leads to overspecification of the models. High levels of multicollinearity adversely affect the performance of Autometrics. In contrast, shrinkage methods are robust in presence of high multicollinearity in terms of potency, but they tend to select a massive set of irrelevant variables. Moreover, we find that expanding the data mitigates the adverse impact of high multicollinearity on Autometrics rapidly and gradually corrects the gauge of shrinkage methods. For empirical application, we take the gold prices data spanning from 1981 to 2020. While comparing the forecasting performance of all selected methods, we divide the data into two parts: data over 1981–2010 are taken as training data, and those over 2011–2020 are used as testing data. All methods are trained for the training data and then are assessed for performance through the testing data. Based on a root-mean-square error and mean absolute error, Autometrics remain the best in capturing the gold prices trend and producing better forecasts than MCP and SCAD.

Download Full-text

Comparing the Forecast Performance of Advanced Statistical and Machine Learning Techniques Using Huge Big Data: Evidence from Monte Carlo Experiments

Complexity ◽

10.1155/2021/6117513 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Faridoon Khan ◽

Amena Urooj ◽

Saud Ahmed Khan ◽

Abdelaziz Alsubie ◽

Zahra Almaspoor ◽

...

Keyword(s):

Sample Size ◽

Factor Model ◽

Principal Component ◽

Small Sample ◽

Training Data ◽

Machine Learning Techniques ◽

Absolute Deviation ◽

Forecast Performance ◽

Forecasting Performance ◽

Sample Case

This research compares factor models based on principal component analysis (PCA) and partial least squares (PLS) with Autometrics, elastic smoothly clipped absolute deviation (E-SCAD), and minimax concave penalty (MCP) under different simulated schemes like multicollinearity, heteroscedasticity, and autocorrelation. The comparison is made with varying sample size and covariates. We found that in the presence of low and moderate multicollinearity, MCP often produces superior forecasts in contrast to small sample case, whereas E-SCAD remains better. In the case of high multicollinearity, the PLS-based factor model remained dominant, but asymptotically the prediction accuracy of E-SCAD significantly enhances compared to other methods. Under heteroscedasticity, MCP performs very well and most of the time beats the rival methods. In some circumstances under large samples, Autometrics provides a similar forecast as MCP. In the presence of low and moderate autocorrelation, MCP shows outstanding forecasting performance except for the small sample case, whereas E-SCAD produces a remarkable forecast. In the case of extreme autocorrelation, E-SCAD outperforms the rival techniques under both the small and medium samples, but further augmentation in sample size enables MCP forecast more accurate comparatively. To compare the predictive ability of all methods, we split the data into two halves (i.e., data over 1973–2007 as training data and data over 2008–2020 as testing data). Based on the root mean square error and mean absolute error, the PLS-based factor model outperforms the competitor models in terms of forecasting performance.

Download Full-text

Scalable Approach to High Coverages on Oxides via Iterative Training of a Machine-Learning Algorithm

10.26434/chemrxiv.10288514.v1 ◽

2019 ◽

Author(s):

Andrew Medford ◽

Shengchun Yang ◽

Fuzhu Liu

Keyword(s):

Machine Learning ◽

Chemical Potential ◽

Learning Algorithm ◽

Absolute Error ◽

Low Energy ◽

Training Data ◽

High Coverage ◽

Metal Compounds ◽

Adsorption Energies ◽

The Stability

Understanding the interaction of multiple types of adsorbate molecules on solid surfaces is crucial to establishing the stability of catalysts under various chemical environments. Computational studies on the high coverage and mixed coverages of reaction intermediates are still challenging, especially for transition-metal compounds. In this work, we present a framework to predict differential adsorption energies and identify low-energy structures under high- and mixed-adsorbate coverages on oxide materials. The approach uses Gaussian process machine-learning models with quantified uncertainty in conjunction with an iterative training algorithm to actively identify the training set. The framework is demonstrated for the mixed adsorption of CHx, NHx and OHx species on the oxygen vacancy and pristine rutile TiO2(110) surface sites. The results indicate that the proposed algorithm is highly efficient at identifying the most valuable training data, and is able to predict differential adsorption energies with a mean absolute error of ~0.3 eV based on <25% of the total DFT data. The algorithm is also used to identify 76% of the low-energy structures based on <30% of the total DFT data, enabling construction of surface phase diagrams that account for high and mixed coverage as a function of the chemical potential of C, H, O, and N. Furthermore, the computational scaling indicates the algorithm scales nearly linearly (N1.12) as the number of adsorbates increases. This framework can be directly extended to metals, metal oxides, and other materials, providing a practical route toward the investigation of the behavior of catalysts under high-coverage conditions.

Download Full-text

An Analog Circuit Fault Diagnosis Approach Based on Wavelet-based fractal analysis and Multiple Kernel SVM

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666201207154641 ◽

2020 ◽

Vol 13 ◽

Author(s):

Jianfeng Jiang

Keyword(s):

Fault Diagnosis ◽

Fractal Analysis ◽

Analog Circuit ◽

Training Data ◽

Support Vector ◽

Pass Filter ◽

Multiple Kernel ◽

Testing Data ◽

Circuit Fault Diagnosis ◽

Diagnosis Approach

Objective: In order to diagnose the analog circuit fault correctly, an analog circuit fault diagnosis approach on basis of wavelet-based fractal analysis and multiple kernel support vector machine (MKSVM) is presented in the paper. Methods: Time responses of the circuit under different faults are measured, and then wavelet-based fractal analysis is used to process the collected time responses for the purpose of generating features for the signals. Kernel principal component analysis (KPCA) is applied to reduce the features’ dimensionality. Afterwards, features are divided into training data and testing data. MKSVM with its multiple parameters optimized by chaos particle swarm optimization (CPSO) algorithm is utilized to construct an analog circuit fault diagnosis model based on the testing data. Results: The proposed analog diagnosis approach is revealed by a four opamp biquad high-pass filter fault diagnosis simulation. Conclusion: The approach outperforms other commonly used methods in the comparisons.

Download Full-text

Improving Pattern Recognition Rate by Gaussian Hopfield Neural Network

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.189-193.2042 ◽

2011 ◽

Vol 189-193 ◽

pp. 2042-2045 ◽

Cited By ~ 1

Author(s):

Shang Jen Chuang ◽

Chiung Hsing Chen ◽

Chien Chih Kao ◽

Fang Tsung Liu

Keyword(s):

Neural Network ◽

Pattern Recognition ◽

Gaussian Distribution ◽

Test Pattern ◽

Recognition Rate ◽

Hopfield Neural Network ◽

Training Data ◽

New Method ◽

Gaussian Filter ◽

Testing Data

English letters cannot be recognized by the Hopfield Neural Network if it contains noise over 50%. This paper proposes a new method to improve recognition rate of the Hopfield Neural Network. To advance it, we add the Gaussian distribution feature to the Hopfield Neural Network. The Gaussian filter was added to eliminate noise and improve Hopfield Neural Network’s recognition rate. We use English letters from ‘A’ to ‘Z’ as training data. The noises from 0% to 100% were generated randomly for testing data. Initially, we use the Gaussian filter to eliminate noise and then to recognize test pattern by Hopfield Neural Network. The results are we found that if letters contain noise between 50% and 53% will become reverse phenomenon or unable recognition [6]. In this paper, we propose to uses multiple filters to improve recognition rate when letters contain noise between 50% and 53%.

Download Full-text

Klasifikasi Gerakan Jatuh Berbasis Accelerometer dan Gyroscope Menggunakan K-Nearest Neighbors

Journal of Applied Electrical Engineering ◽

10.30871/jaee.v4i2.1300 ◽

2020 ◽

Vol 4 (2) ◽

pp. 24-29

Author(s):

Adlian Jefiza ◽

Indra Daulay ◽

Jhon Hericson Purba

Keyword(s):

Real Time ◽

Root Mean Square ◽

Nearest Neighbors ◽

Training Data ◽

Mean Square ◽

K Nearest Neighbors ◽

Testing Data ◽

Total Data

Permasalahan utama pada penelitian ini merujuk kepada semakin menurunnya daya tahan tubuh lanjut usia (lansia). Hal ini membutuhkan sistem monitoring aktivitas lansia secara real time. Untuk mendeteksi kegiatan para lansia, dirancang sebuah perangkat monitoring dengan accelerometer 3-sumbu dan gyroscope 3-sumbu. Data sensor diperoleh dari lima partisipan. Setiap partisipan melakukan lima gerakan yaitu terjatuh, duduk, tidur, rukuk dan sujud. Gerakan yang dipilih adalah gerakan yang menyerupai gerakan jatuh. Total data yang diperoleh dari partisipan adalah 75 data yang terbagi menjadi training data dan testing data. Penelitian ini menggunakan metode transformasi Wavelet untuk mengenali fitur dari gerakan. Untuk pengklasifikasian setiap gerakan, digunakan metode K-nearest neighbors (KNN). Hasil klasifikasi gerakan menggunakan lima kelas menghasilkan nilai root mean square sebesar 0.0074 dengan akurasi 100%.

Download Full-text

An Investigation of the Characteristics of a Bayesian Military Impulse Noise Classifier

ASME 2008 Noise Control and Acoustics Division Conference ◽

10.1115/ncad2008-73046 ◽

2008 ◽

Cited By ~ 1

Author(s):

Brian Bucci ◽

Jeffrey Vipperman

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Bayesian Methods ◽

False Positive ◽

Impulse Noise ◽

False Negative ◽

Training Data ◽

Maximum Accuracy ◽

First Case ◽

Testing Data

In extension of previous methods to identify military impulse noise in the civilian environmental noise monitoring setting by means of a set of computed scalar metrics input to artificial neural network structures, Bayesian methods are investigated to classify the same dataset. Four interesting cases are identified and analyzed: A) Maximum accuracy achieve on training data, B) Maximum overall accuracy on blind testing data, C) Maximum accuracy on testing data with zero false positive detections, D) Maximum accuracy on testing data with zero false negative rejections. The first case is used to illustrative example and the later three represent actual monitoring modes. All of the cases are compared and contrasted to illuminate respective strengths and weaknesses. Overall accuracies of up to 99.8% are observed with no false negative rejections and accuracies of up to 98.4% are also achieved with no false positive detections.

Download Full-text

Deteksi Gulma Berdasarkan Warna HSV dan Fitur Bentuk Menggunakan Jaringan Syaraf Tiruan

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021854719 ◽

2021 ◽

Vol 8 (5) ◽

pp. 929

Author(s):

Hurriyatul Fitriyah ◽

Rizal Maulana

Keyword(s):

Learning Algorithm ◽

Color Space ◽

Outdoor Environment ◽

Training Data ◽

Distance Transform ◽

Validation Data ◽

Hsv Color Space ◽

Testing Data ◽

Artificial Neural Network Ann ◽

Center Distance

Gulma merupakan tanaman pengganggu dalam lahan pertanian. Herbisida merupakan obat yang efektif membunuh gulma tersebut. Penyemprotan herbisida harus tepat sasaran kepada gulma saja dan tidak mengenai tanaman. Penelitian ini membuat sistem yang dapat mendeteksi gulma secara otomatis di antara tanaman pada lahan pertanian riil. Sistem ini menggunakan gambar lahan pertanian riil dimana tanaman tampak utuh (daun dapat lebih dari satu) yang diambil menggunakan kamera dengan posisi vertikal menghadap ke bawah. Algoritma yang dibuat menggunakan segmentasi berdasarkan warna hijau dalam ruang warna HSV untuk mendeteksi daun, baik gulma maupun tanaman pada beragam pencahayaan. Sebanyak tiga fitur bentuk domain spasial digunakan untuk membedakan gulma dengan tanaman yang memiliki karakteristik bentuk daun yang berbeda. Fitur bentuk yang digunakan adalah Rectangularity, Edge-to-Center distances function, dan Distance Transform function. Klasifikasi gulma dan tanaman menggunakan metode Jaringan syaraf tiruan (JST) yang dapat dilatih secara offline. Dari 149 tanaman yang terdeteksi dimana 70% sebagai data training, 15% data validasi dan 15% data uji, didapati akurasi pengujian sebesar 95.46%. AbstractWeed is a major challenge in a crop plantation. A herbicide is the most effective substance to kill this unwanted vegetation. Spraying the herbicide must be done carefully to target the weeds only. Here in this research, we develop an algorithm that detects weeds among the plants based on the shape of their leaves. The detection is based on images that were acquired using a camera. The leaves of weeds and plants were detected based on their green color using segmentation in HSV color-space as it is more effective to detect objects in various illumination. Three shape features were extracted, which are Rectangularity that is based on Rectangularity, Edge-to-Center distance function, and Distance Transform function. Those features were fed into a learning algorithm, Artificial Neural Network (ANN), to classify whether it is the plant or the weed. The testing on the weed classification in a real outdoor environment showed 95.46% accuracy using a total of 149 detected plants (70% as training data, 15% as validation data, and 15% as testing data).

Download Full-text

miRmedon: confident detection of microRNA editing

10.1101/774661 ◽

2019 ◽

Author(s):

Amitai Mordechai ◽

Alal Eran

Keyword(s):

Cancer Progression ◽

Large Scale ◽

Complex Processes ◽

Mirna Genes ◽

Multiple Loci ◽

Rnaseq Data ◽

Error Distributions ◽

Improved Performance ◽

Mirna Editing ◽

Small Rnaseq

SummarymicroRNA (miRNA), key regulators of gene expression, are prime targets for adenosine deaminase acting on RNA (ADAR) enzymes. Although ADAR-mediated A-to-I miRNA editing has been shown to be essential for orchestrating complex processes, including neurodevelopment and cancer progression, only a few human miRNA editing sites have been reported. Several computational approaches have been developed for the detection of miRNA editing in small RNAseq data, all based on the identification of systematic mismatches of ‘G’ at primary adenosine sites in known miRNA sequences. However, these methods have several limitations, including their ability to detect only one editing site per sequence (although editing of multiple sites per miRNA has been reproducibly validated), their focus on uniquely mapping reads (although 20% of human miRNA are transcribed from multiple loci), and their inability to detect editing in miRNA genes harboring genomic variants (although 73% of human miRNA loci include a reported SNP or indel). To overcome these limitations, we developed miRmedon, that leverages large scale human variation data, a combination of local and global alignments, and a comparison of the inferred editing and error distributions, for a confident detection of miRNA editing in small RNAseq data. We demonstrate its improved performance as compared to currently available methods and describe its advantages.Availability and implementationPython source code is available at https://github.com/Amitai88/[email protected]

Download Full-text

Forecasting Wheat Production in Pakistan

THE LAHORE JOURNAL OF ECONOMICS ◽

10.35536/lje.2008.v13.i1.a3 ◽

2008 ◽

Vol 13 (1) ◽

pp. 57-85 ◽

Cited By ~ 3

Author(s):

Falak Sher ◽

Eatzaz Ahmad

Keyword(s):

Production Function ◽

Mean Absolute Error ◽

Absolute Error ◽

Forecasting Model ◽

Future Prospects ◽

Wheat Production ◽

Steady Growth ◽

Forecasting Performance ◽

The Future ◽

Out Of Sample

This study analyzes the future prospects of wheat production in Pakistan. Parameters of the forecasting model are obtained by estimating a Cobb-Douglas production function for wheat, while future values of various inputs are obtained as dynamic forecasts on the basis of separate ARIMA estimates for each input and for each province. Input forecasts and parameters of the wheat production function are then used to generate wheat forecasts. The results of the study show that the most important variables for predicting wheat production per hectare (in order of importance) are: lagged output, labor force, use of tractors, and sum of the rainfall in the months of November to March. The null hypotheses of common coefficients across provinces for most of the variables cannot be rejected, implying that all variables play the same role in wheat production in all the four provinces. Forecasting performance of the model based on out-of-sample forecasts for the period 2005-06 is highly satisfactory with 1.81% mean absolute error. The future forecasts for the period of 2007-15 show steady growth of 1.6%, indicating that Pakistan will face a slight shortage of wheat output in the future.

Download Full-text

Forecasting Malaysian overnight islamic interbank rate using the Box-Jenkins model

Data Analytics and Applied Mathematics (DAAM) ◽

10.15282/daam.v2i1.6837 ◽

2021 ◽

Vol 2 (1) ◽

pp. 38-51

Author(s):

N.S.M. Radzi ◽

S.R. Yaziz

Keyword(s):

Monetary Policy ◽

Mean Absolute Error ◽

Absolute Error ◽

Prediction Intervals ◽

Percentage Error ◽

Mean Square ◽

Islamic Banks ◽

Financial Instruments ◽

Policy Makers ◽

Forecasting Performance

Modelling the overnight Islamic interbank rate (IIR) is imperative to define the IIR performance as it would help the Islamic banks to adjust its costs of funding effectively and facilitate the policy makers to regulate a comprehensive monetary policy in Malaysia. The IIR framework which has been regulated by Bank Negara Malaysia under dual banking and financial system has always been overlooked in most previous studies in modelling the financial instruments rates. Therefore, it is vital to select the appropriate model as it resembles with the features of the IIR. The study assesses the forecasting performance of overnight IIR using the Box-Jenkins model. The suggested Box-Jenkins model has been applied to the Malaysian overnight IIR (in percentage) from 02/01/2001 to 31/12/2020. The empirical results determine that ARIMA (0,1,1) is the most appropriate model in forecasting overnight IIR as the model provides the smallest Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). In multistep ahead forecasting, it can be summarised that ARIMA (0,1,1) model is able to trail the actual data trend of daily Malaysian overnight IIR up to 5-day ahead within 95% prediction intervals.

Download Full-text