Interclass Interference Suppression in Multi-Class Problems

Jinfu Liu; Mingliang Bai; Na Jiang; Ran Cheng; Xianling Li; Yifang Wang; Daren Yu

doi:10.3390/app11010450

Interclass Interference Suppression in Multi-Class Problems

Applied Sciences ◽

10.3390/app11010450 ◽

2021 ◽

Vol 11 (1) ◽

pp. 450

Author(s):

Jinfu Liu ◽

Mingliang Bai ◽

Na Jiang ◽

Ran Cheng ◽

Xianling Li ◽

...

Keyword(s):

Classification Accuracy ◽

Cross Validation ◽

Selection Process ◽

Interference Suppression ◽

Generalization Ability ◽

Suppression Effect ◽

Binary Classifiers ◽

The One ◽

Fold Cross Validation ◽

Validation Experiments

Multi-classifiers are widely applied in many practical problems. But the features that can significantly discriminate a certain class from others are often deleted in the feature selection process of multi-classifiers, which seriously decreases the generalization ability. This paper refers to this phenomenon as interclass interference in multi-class problems and analyzes its reason in detail. Then, this paper summarizes three interclass interference suppression methods including the method based on all-features, one-class classifiers and binary classifiers and compares their effects on interclass interference via the 10-fold cross-validation experiments in 14 UCI datasets. Experiments show that the method based on binary classifiers can suppress the interclass interference efficiently and obtain the best classification accuracy among the three methods. Further experiments were done to compare the suppression effect of two methods based on binary classifiers including the one-versus-one method and one-versus-all method. Results show that the one-versus-one method can obtain a better suppression effect on interclass interference and obtain better classification accuracy. By proposing the concept of interclass inference and studying its suppression methods, this paper significantly improves the generalization ability of multi-classifiers.

Download Full-text

Transfer-to-Transfer Learning Approach for Computer Aided Detection of COVID-19 in Chest Radiographs

AI ◽

10.3390/ai1040032 ◽

2020 ◽

Vol 1 (4) ◽

pp. 539-557 ◽

Cited By ~ 1

Author(s):

Barath Narayanan ◽

Russell Hardie ◽

Vignesh Krishnaraja ◽

Christina Karam ◽

Venkata Davuluru

Keyword(s):

Transfer Learning ◽

Cross Validation ◽

Class Imbalance ◽

Chest Radiographs ◽

Training Dataset ◽

Learning Approach ◽

Computer Aided Detection ◽

Computer Aided ◽

Fold Cross Validation ◽

Validation Experiments

The coronavirus disease 2019 (COVID-19) global pandemic has severely impacted lives across the globe. Respiratory disorders in COVID-19 patients are caused by lung opacities similar to viral pneumonia. A Computer-Aided Detection (CAD) system for the detection of COVID-19 using chest radiographs would provide a second opinion for radiologists. For this research, we utilize publicly available datasets that have been marked by radiologists into two-classes (COVID-19 and non-COVID-19). We address the class imbalance problem associated with the training dataset by proposing a novel transfer-to-transfer learning approach, where we break a highly imbalanced training dataset into a group of balanced mini-sets and apply transfer learning between these. We demonstrate the efficacy of the method using well-established deep convolutional neural networks. Our proposed training mechanism is more robust to limited training data and class imbalance. We study the performance of our algorithm(s) based on 10-fold cross validation and two hold-out validation experiments to demonstrate its efficacy. We achieved an overall sensitivity of 0.94 for the hold-out validation experiments containing 2265 and 2139 marked as COVID-19 chest radiographs, respectively. For the 10-fold cross validation experiment, we achieve an overall Area under the Receiver Operating Characteristic curve (AUC) value of 0.996 for COVID-19 detection. This paper serves as a proof-of-concept that an automated detection approach can be developed with a limited set of COVID-19 images, and in areas with scarcity of trained radiologists.

Download Full-text

High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8403.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 105-110

Author(s):

D. Mabuni ◽

S. Aquter Babu

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Classification Accuracy ◽

Cross Validation ◽

Training Dataset ◽

Decision Tree Classification ◽

Testing Dataset ◽

Tree Classifier ◽

Validation Technique ◽

Fold Cross Validation

In machine learning data usage is the most important criterion than the logic of the program. With very big and moderate sized datasets it is possible to obtain robust and high classification accuracies but not with small and very small sized datasets. In particular only large training datasets are potential datasets for producing robust decision tree classification results. The classification results obtained by using only one training and one testing dataset pair are not reliable. Cross validation technique uses many random folds of the same dataset for training and validation. In order to obtain reliable and statistically correct classification results there is a need to apply the same algorithm on different pairs of training and validation datasets. To overcome the problem of the usage of only a single training dataset and a single testing dataset the existing k-fold cross validation technique uses cross validation plan for obtaining increased decision tree classification accuracy results. In this paper a new cross validation technique called prime fold is proposed and it is experimentally tested thoroughly and then verified correctly using many bench mark UCI machine learning datasets. It is observed that the prime fold based decision tree classification accuracy results obtained after experimentation are far better than the existing techniques of finding decision tree classification accuracies.

Download Full-text

Detection of Financial Statement Fraud Using Evolutionary Algorithms

Journal of Emerging Technologies in Accounting ◽

10.2308/jeta-50390 ◽

2012 ◽

Vol 9 (1) ◽

pp. 71-94 ◽

Cited By ~ 9

Author(s):

Matthew E. Alden ◽

Daniel M. Bryan ◽

Brenton J. Lessley ◽

Arindam Tripathy

Keyword(s):

Evolutionary Algorithms ◽

Classification Accuracy ◽

Cross Validation ◽

Fuzzy Rule ◽

Financial Statement ◽

Financial Statement Fraud ◽

Rule Based ◽

Estimation Of Distribution ◽

Accuracy Rates ◽

Fold Cross Validation

ABSTRACT In this paper, we use a Genetic Algorithm (GA) and MARLEDA—a modern Estimation of Distribution Algorithm (EDA)—to evolve and train several fuzzy rule-based classifiers (FRBCs) to detect patterns of financial statement fraud. We find that both GA and MARLEDA demonstrate a better ability to classify unseen corporate data observations than those of a traditional logistic regression model, and provide validity for detecting financial statement fraud with Evolutionary Algorithms (EAs) and FRBCs. Using ten-fold cross-validation, the GA and MARLEDA yield average training classification accuracy rates of 75.47 percent and 74.26 percent, respectively, and average validation accuracy rates of 63.75 percent and 64.46 percent, respectively.

Download Full-text

Visual Stimulus Background Effects on SSVEP-Based BCI Towards a Practical Robot Car Control

International Journal of Humanoid Robotics ◽

10.1142/s0219843615500140 ◽

2015 ◽

Vol 12 (02) ◽

pp. 1550014 ◽

Cited By ~ 2

Author(s):

Xiaokang Shu ◽

Lin Yao ◽

Jianjun Meng ◽

Xinjun Sheng ◽

Xiangyang Zhu

Keyword(s):

Steady State ◽

Visual Evoked Potentials ◽

Visual Stimulus ◽

Classification Accuracy ◽

Cross Validation ◽

Brain Computer Interface ◽

Video Feedback ◽

Computer Interface ◽

Dynamic Scene ◽

Fold Cross Validation

Flickering source is an indispensable component in steady-state visual evoked potentials (SSVEPs)-based brain–computer interface (BCI), and its background severely influences the potentials evoked by the repetitive stimuli. In this paper, we investigated the problem under three different backgrounds in the context of the SSVEP-BCI-based robot car control, including black screen, static scene and dynamic scene of the environment. In the ten subjects experiment, we found significant decrease in SSVEP amplitude in dynamic scene condition compared to the reference condition black screen (p < 0.05), which resulted in classification accuracy decrease as evaluated by 10-fold cross validation. However, our proposed experiment paradigm has shown that training with static scene or dynamic scene condition could well compensate this performance drop and improve the online robot car control with real-time video feedback. The addressed problem in our application would provide some valuable suggestions when translating the SSVEP-BCI from laboratory exploration into practical usages.

Download Full-text

A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

BioMed Research International ◽

10.1155/2018/9364182 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 4

Author(s):

Runtao Yang ◽

Chengjin Zhang ◽

Lina Zhang ◽

Rui Gao

Keyword(s):

Feature Selection ◽

Molecular Mechanisms ◽

Cross Validation ◽

Selection Process ◽

Feature Selection Method ◽

Imbalanced Data ◽

Computational Method ◽

Accurate Identification ◽

Comparison Results ◽

Fold Cross Validation

Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature spaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific Scoring Matrix), and disorder. The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data problem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select informative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed method achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and an MCC (Matthew’s Correlation Coefficient) of 0.497. The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation. The comparison results demonstrate the high effectiveness of our method for predicting cancerlectins.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Klasifikasi Berita Kriminal Menggunakan NaÃ¯ve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.177 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-117

Author(s):

Herfia Rhomadhona ◽

Jaka Permadi

Keyword(s):

Cross Validation ◽

Online Media ◽

Bayes Classifier ◽

Ve Bayes ◽

Fold Cross Validation

Berita kriminalitas merupakan berita yang selalu menjadi trending topik di setiap media massa, khususnya media massa online. Media massa online terlah menyediakan beberapa fasilitas untuk mempermudah masyarakan dalam mencari sebuah berita berdasarkan topik. Media massa online melabeli suatu berita berdasarkan kategorinya. Namun, media massa online tidak memberikan sub kategori pada berita tersebut. Sebagai contoh jika seorang pengguna membuka kategori kriminal, maka yang ditampilkan adalah semua jenis berita kriminal tanpa memberikan informasi yang spesifik dari jenis kriminalitasnya. Permasalahan tersebut dapat diatasi dengan mengklasifikasikan berita kriminalitas berdasarkan subkategori. Penelitian ini menggunakan metode NaÃ¯ve Bayes Classifier (NBC) untuk mengklasifikasi berita berdasarkan sub kategorinya. Adapun subkategori terbagi kedalam 5 kategori yaitu korupsi, narkoba, pencurian, pemerkosaan dan pembunuhan. Penelitian ini bertujuan untuk mengetahui kemampuan NBC dalam mengklasifikasi berita dengan melakukan pengujian menggunakan teknik K-Fold Cross Validation dengan nilai K dari 3 sampai 10. Hasil pengujian menyatakan bahwa NBC memiliki kemampuan dalam klasifikasi berita kriminal dengan nilai precision sebesar 98,53 %, nilai recall sebesar 98,44 % dan nilai accuracy sebesar 99,38 %.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Towards computer-aided severity assessment via deep neural networks for geographic and opacity extent scoring of SARS-CoV-2 chest X-rays

Scientific Reports ◽

10.1038/s41598-021-88538-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

A. Wong ◽

Z. Q. Lin ◽

L. Wang ◽

A. G. Chung ◽

B. Shen ◽

...

Keyword(s):

Neural Networks ◽

Monte Carlo ◽

Lung Disease ◽

Disease Severity ◽

Deep Neural Networks ◽

Cross Validation ◽

X Rays ◽

Computer Aided ◽

Monte Carlo Cross Validation ◽

Validation Experiments

AbstractA critical step in effective care and treatment planning for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause for the coronavirus disease 2019 (COVID-19) pandemic, is the assessment of the severity of disease progression. Chest x-rays (CXRs) are often used to assess SARS-CoV-2 severity, with two important assessment metrics being extent of lung involvement and degree of opacity. In this proof-of-concept study, we assess the feasibility of computer-aided scoring of CXRs of SARS-CoV-2 lung disease severity using a deep learning system. Data consisted of 396 CXRs from SARS-CoV-2 positive patient cases. Geographic extent and opacity extent were scored by two board-certified expert chest radiologists (with 20+ years of experience) and a 2nd-year radiology resident. The deep neural networks used in this study, which we name COVID-Net S, are based on a COVID-Net network architecture. 100 versions of the network were independently learned (50 to perform geographic extent scoring and 50 to perform opacity extent scoring) using random subsets of CXRs from the study, and we evaluated the networks using stratified Monte Carlo cross-validation experiments. The COVID-Net S deep neural networks yielded R$$^2$$ 2 of $$0.664 \pm 0.032$$ 0.664 ± 0.032 and $$0.635 \pm 0.044$$ 0.635 ± 0.044 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively, in stratified Monte Carlo cross-validation experiments. The best performing COVID-Net S networks achieved R$$^2$$ 2 of 0.739 and 0.741 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively. The results are promising and suggest that the use of deep neural networks on CXRs could be an effective tool for computer-aided assessment of SARS-CoV-2 lung disease severity, although additional studies are needed before adoption for routine clinical use.

Download Full-text