A Framework for the Selection of Binarization Techniques on Palm Leaf Manuscripts Using Support Vector Machine

Advances in Decision Sciences ◽

10.1155/2015/925935 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Rapeeporn Chamchong ◽

Chun Che Fung

Keyword(s):

Selection Process ◽

Text Processing ◽

Imbalanced Data ◽

Support Vector ◽

Local Contrast ◽

Text Documents ◽

Image Characteristics ◽

Vector Machines ◽

High Degree ◽

Palm Leaf

Challenges for text processing in ancient document images are mainly due to the high degree of variations in foreground and background. Image binarization is an image segmentation technique used to separate the image into text and background components. Although several techniques for binarizing text documents have been proposed, the performance of these techniques varies and depends on the image characteristics. Therefore, selecting binarization techniques can be a key idea to achieve improved results. This paper proposes a framework for selecting binarizing techniques of palm leaf manuscripts using Support Vector Machines (SVMs). The overall process is divided into three steps: (i) feature extraction: feature patterns are extracted from grayscale images based on global intensity, local contrast, and intensity; (ii) treatment of imbalanced data: imbalanced dataset is balanced by using Synthetic Minority Oversampling Technique as to improve the performance of prediction; and (iii) selection: SVM is applied in order to select the appropriate binarization techniques. The proposed framework has been evaluated with palm leaf manuscript images and benchmarking dataset from DIBCO series and compared the performance of prediction between imbalanced and balanced datasets. Experimental results showed that the proposed framework can be used as an integral part of an automatic selection process.

Download Full-text

Imbalanced data classification via support vector machines and genetic algorithms

Connection Science ◽

10.1080/09540091.2014.924902 ◽

2014 ◽

Vol 26 (4) ◽

pp. 335-348 ◽

Cited By ~ 7

Author(s):

Jair Cervantes ◽

Xiaoou Li ◽

Wen Yu

Keyword(s):

Genetic Algorithms ◽

Support Vector Machines ◽

Imbalanced Data ◽

Data Classification ◽

Support Vector ◽

Imbalanced Data Classification ◽

Vector Machines

Download Full-text

A new adaptive weighted imbalanced data classifier via improved support vector machines with high-dimension nature

Knowledge-Based Systems ◽

10.1016/j.knosys.2019.104933 ◽

2019 ◽

Vol 185 ◽

pp. 104933 ◽

Cited By ~ 2

Author(s):

Kai Qi ◽

Hu Yang ◽

Qingyu Hu ◽

Dongjun Yang

Keyword(s):

Support Vector Machines ◽

High Dimension ◽

Imbalanced Data ◽

Support Vector ◽

Vector Machines

Download Full-text

The Application of Problems Concerning the Imbalanced Data Classification by Means of Support Vector Machines

2011 Fourth International Symposium on Knowledge Acquisition and Modeling ◽

10.1109/kam.2011.84 ◽

2011 ◽

Author(s):

Chen Qing

Keyword(s):

Support Vector Machines ◽

Imbalanced Data ◽

Data Classification ◽

Support Vector ◽

Imbalanced Data Classification ◽

Vector Machines

Download Full-text

A new classification strategy for human activity recognition using cost sensitive support vector machines for imbalanced data

Kybernetes ◽

10.1108/k-07-2014-0138 ◽

2014 ◽

Vol 43 (8) ◽

pp. 1150-1164 ◽

Cited By ~ 9

Author(s):

Bilal M’hamed Abidine ◽

Belkacem Fergani ◽

Mourad Oussalah ◽

Lamya Fergani

Keyword(s):

Support Vector Machines ◽

Probabilistic Models ◽

Conditional Random Fields ◽

Performance Metrics ◽

Imbalanced Data ◽

Sampling Technique ◽

Support Vector ◽

Data Set ◽

Content Type ◽

Vector Machines

Purpose – The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues. Design/methodology/approach – In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem. Findings – The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors. Originality/value – Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.

Download Full-text

Expert guided natural language processing using one-class classification

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv010 ◽

2015 ◽

Vol 22 (5) ◽

pp. 962-966 ◽

Cited By ~ 5

Author(s):

Erel Joffe ◽

Emily J Pettigrew ◽

Jorge R Herskovic ◽

Charles F Bearden ◽

Elmer V Bernstam

Keyword(s):

Breast Cancer ◽

Language Processing ◽

Text Processing ◽

Binary Classification ◽

Model Performance ◽

Imbalanced Data ◽

Superior Performance ◽

Support Vector ◽

Free Text ◽

One Class Classification

Abstract Introduction Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing. Objectives To compare the performance of one-class classification to traditional binary classification; to evaluate the utility of feature selection based on expert-selected salient text (snippets); and to determine the robustness of these models with respects to irrelevant surrounding text. Methods The authors trained one-class support vector machines (1C-SVMs) and two-class SVMs (2C-SVMs) to identify notes discussing breast cancer. Manually annotated visit summary notes (88 positive and 88 negative for breast cancer) were used to compare the performance of models trained on whole notes labeled as positive or negative to models trained on expert-selected text sections (snippets) relevant to breast cancer status. Model performance was evaluated using a 70:30 split for 20 iterations and on a realistic dataset of 10 000 records with a breast cancer prevalence of 1.4%. Results When tested on a balanced experimental dataset, 1C-SVMs trained on snippets had comparable results to 2C-SVMs trained on whole notes (F = 0.92 for both approaches). When evaluated on a realistic imbalanced dataset, 1C-SVMs had a considerably superior performance (F = 0.61 vs. F = 0.17 for the best performing model) attributable mainly to improved precision (p = .88 vs. p = .09 for the best performing model). Conclusions 1C-SVMs trained on expert-selected relevant text sections perform better than 2C-SVMs classifiers trained on either snippets or whole notes when applied to realistically imbalanced data with low prevalence of the positive class.

Download Full-text

Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2014.01.001 ◽

2014 ◽

Vol 113 (3) ◽

pp. 792-808 ◽

Cited By ~ 48

Author(s):

Abdul Majid ◽

Safdar Ali ◽

Mubashar Iqbal ◽

Nabeela Kausar

Keyword(s):

Support Vector Machines ◽

Nearest Neighbor ◽

Imbalanced Data ◽

Support Vector ◽

Human Breast ◽

Colon Cancers ◽

Vector Machines

Download Full-text

Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME

F1000Research ◽

10.12688/f1000research.26880.1 ◽

2020 ◽

Vol 9 ◽

pp. 1255 ◽

Cited By ~ 1

Author(s):

Malik Yousef ◽

Burcu Bakir-Gungor ◽

Amhar Jabeer ◽

Gokhan Goy ◽

Rehman Qureshi ◽

...

Keyword(s):

Feature Selection ◽

Simple Structure ◽

Selection Process ◽

Ranking Function ◽

Support Vector ◽

Scientific Publications ◽

Vector Machines ◽

Feature Selection Approach ◽

Sensitivity Specificity ◽

Excel File

In our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. The usefulness of SVM-RCE-R is further supported by development of maTE tool, which uses a similar approach to identify microRNA (miRNA) targets. We have now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to apply and to make it more accessible to the biomedical community. The use of SVM-RCE-R in Knime is simple and intuitive, allowing researchers to immediately begin their data analysis without having to consult an information technology specialist. The input for the Knime tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in the previous version. One of these features is a user-specific ranking function that enables the user to provide the weights of the accuracy, sensitivity, specificity, f-measure, area under curve and precision in the ranking function, allowing the user to select for greater sensitivity or greater specificity as needed. The results show that the ranking function has an impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics. This finding motivates future studies to suggest the optimal ranking function.

Download Full-text

Imbalanced Data Classification with Deep Support Vector Machines

Lecture Notes in Electrical Engineering - Artificial Intelligence in China ◽

10.1007/978-981-15-0187-6_10 ◽

2020 ◽

pp. 87-95

Author(s):

Li Zhang ◽

Wei Wang ◽

Mengjun Zhang ◽

Zhixiong Wang

Keyword(s):

Support Vector Machines ◽

Imbalanced Data ◽

Data Classification ◽

Support Vector ◽

Imbalanced Data Classification ◽

Vector Machines ◽

Deep Support

Download Full-text

Modeling using support vector machines on imbalanced data: A case study on the prediction of the sightings of Irrawaddy dolphins

10.1063/1.4915644 ◽

2015 ◽

Author(s):

Liew Chin Ying ◽

Jane Labadin ◽

Wang Yin Chai ◽

Andrew Alek Tuen ◽

Cindy Peter

Keyword(s):

Support Vector Machines ◽

Imbalanced Data ◽

Support Vector ◽

Vector Machines

Download Full-text

Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME

F1000Research ◽

10.12688/f1000research.26880.2 ◽

2021 ◽

Vol 9 ◽

pp. 1255

Author(s):

Malik Yousef ◽

Burcu Bakir-Gungor ◽

Amhar Jabeer ◽

Gokhan Goy ◽

Rehman Qureshi ◽

...

Keyword(s):

Feature Selection ◽

Selection Process ◽

Area Under The Curve ◽

Ranking Function ◽

Support Vector ◽

Scientific Publications ◽

Vector Machines ◽

Feature Selection Approach ◽

Sensitivity Specificity ◽

Excel File

In our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. SVM-RCE-R, further enhances the capabilities of SVM-RCE by the addition of a novel user specified ranking function. This ranking function enables the user to stipulate the weights of the accuracy, sensitivity, specificity, f-measure, area under the curve and the precision in the ranking function This flexibility allows the user to select for greater sensitivity or greater specificity as needed for a specific project. The usefulness of SVM-RCE-R is further supported by development of the maTE tool which uses a similar approach to identify microRNA (miRNA) targets. We have also now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to applyThe use of SVM-RCE-R in Knime is simple and intuitive and allows researchers to immediately begin their analysis without having to consult an information technology specialist. The input for the Knime implemented tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in SVM-RCE. The results show that the inclusion of the ranking function has a significant impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics.

Download Full-text