Protein Subnuclear Localization Based on Radius-SMOTE and Kernel Linear Discriminant Analysis Combined with Random Forest

Liwen Wu; Shanshan Huang; Feng Wu; Qian Jiang; Shaowen Yao; Xin Jin

doi:10.3390/electronics9101566

Protein Subnuclear Localization Based on Radius-SMOTE and Kernel Linear Discriminant Analysis Combined with Random Forest

Electronics ◽

10.3390/electronics9101566 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1566

Author(s):

Liwen Wu ◽

Shanshan Huang ◽

Feng Wu ◽

Qian Jiang ◽

Shaowen Yao ◽

...

Keyword(s):

Discriminant Analysis ◽

Random Forest ◽

Linear Discriminant Analysis ◽

Linear Discriminant ◽

Jackknife Test ◽

Subnuclear Localization ◽

Feature Vectors ◽

Benchmark Datasets ◽

Novel Method ◽

Protein Datasets

Protein subnuclear localization plays an important role in proteomics, and can help researchers to understand the biologic functions of nucleus. To date, most protein datasets used by studies are unbalanced, which reduces the prediction accuracy of protein subnuclear localization—especially for the minority classes. In this work, a novel method is therefore proposed to predict the protein subnuclear localization of unbalanced datasets. First, the position-specific score matrix is used to extract the feature vectors of two benchmark datasets and then the useful features are selected by kernel linear discriminant analysis. Second, the Radius-SMOTE is used to expand the samples of minority classes to deal with the problem of imbalance in datasets. Finally, the optimal feature vectors of the expanded datasets are classified by random forest. In order to evaluate the performance of the proposed method, four index evolutions are calculated by Jackknife test. The results indicate that the proposed method can achieve better effect compared with other conventional methods, and it can also improve the accuracy for both majority and minority classes effectively.

Download Full-text

Classification of Breast Cancer versus Normal Samples from Mass Spectrometry Profiles Using Linear Discriminant Analysis of Important Features Selected by Random Forest

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1345 ◽

2008 ◽

Vol 7 (2) ◽

Cited By ~ 15

Author(s):

Somnath Datta

Keyword(s):

Breast Cancer ◽

Mass Spectrometry ◽

Discriminant Analysis ◽

Random Forest ◽

Linear Discriminant Analysis ◽

Linear Discriminant

Download Full-text

Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm

PLoS ONE ◽

10.1371/journal.pone.0195636 ◽

2018 ◽

Vol 13 (4) ◽

pp. e0195636 ◽

Cited By ~ 7

Author(s):

Shunfang Wang ◽

Yaoting Yue

Keyword(s):

Genetic Algorithm ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Linear Discriminant ◽

Subnuclear Localization ◽

Effective Representation

Download Full-text

Equivalence between LDA/QR and Direct LDA

Cognitive Informatics for Revealing Human Cognition ◽

10.4018/978-1-4666-2476-4.ch021 ◽

2012 ◽

pp. 338-353

Author(s):

Rong-Hua Li ◽

Shuang Liang ◽

George Baciu ◽

Eddie Chan

Keyword(s):

Discriminant Analysis ◽

Dimension Reduction ◽

Linear Discriminant Analysis ◽

Classification Accuracy ◽

Qr Decomposition ◽

Linear Discriminant ◽

Qr Algorithm ◽

Special Cases ◽

Benchmark Datasets ◽

Pseudo Inverse

Singularity problems of scatter matrices in Linear Discriminant Analysis (LDA) are challenging and have obtained attention during the last decade. Linear Discriminant Analysis via QR decomposition (LDA/QR) and Direct Linear Discriminant analysis (DLDA) are two popular algorithms to solve the singularity problem. This paper establishes the equivalent relationship between LDA/QR and DLDA. They can be regarded as special cases of pseudo-inverse LDA. Similar to LDA/QR algorithm, DLDA can also be considered as a two-stage LDA method. Interestingly, the first stage of DLDA can act as a dimension reduction algorithm. The experiment compares LDA/QR and DLDA algorithms in terms of classification accuracy, computational complexity on several benchmark datasets and compares their first stages. The results confirm the established equivalent relationship and verify their capabilities in dimension reduction.

Download Full-text

Equivalence Between LDA/QR and Direct LDA

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/jcini.2011010106 ◽

2011 ◽

Vol 5 (1) ◽

pp. 94-112 ◽

Cited By ~ 1

Author(s):

Rong-Hua Li ◽

Shuang Liang ◽

George Baciu ◽

Eddie Chan

Keyword(s):

Discriminant Analysis ◽

Dimension Reduction ◽

Linear Discriminant Analysis ◽

Qr Decomposition ◽

Singularity Problem ◽

Linear Discriminant ◽

Qr Algorithm ◽

Special Cases ◽

Benchmark Datasets ◽

Pseudo Inverse

Download Full-text

Classification of HIV-1 protease crystal structures using Random Forest, linear discriminant analysis and logistic regression

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology ◽

10.1109/cibcb.2010.5510465 ◽

2010 ◽

Author(s):

Gene M. Ko ◽

A. Srinivas Reddy ◽

Sunil Kumar ◽

Barbara A. Bailey ◽

Rajni Garg

Keyword(s):

Logistic Regression ◽

Discriminant Analysis ◽

Random Forest ◽

Linear Discriminant Analysis ◽

Crystal Structures ◽

Linear Discriminant ◽

Hiv 1

Download Full-text

Acoustic Models for the Automatic Identification of Prosodic Boundaries in Spontaneous Speech / Modelos acústicos para a identificação automática de fronteiras prosódicas na fala espontânea

Revista de Estudos da Linguagem ◽

10.17851/2237-2083.26.4.1455-1488 ◽

2018 ◽

Vol 26 (4) ◽

pp. 1455 ◽

Cited By ~ 1

Author(s):

Bárbara Helohá Falcão Teixeira ◽

Maryualê Malvessi Mittmann

Keyword(s):

Discriminant Analysis ◽

Random Forest ◽

Linear Discriminant Analysis ◽

Speech Rate ◽

Automatic Segmentation ◽

Spontaneous Speech ◽

Automatic Identification ◽

Acoustic Parameters ◽

Linear Discriminant ◽

Prosodic Boundaries

Abstract: This work presents the results of the analysis of multiple acoustic parameters for the construction of a model for the automatic segmentation of speech in tone units. Based on literature review, we defined sets of acoustic parameters related to the signalization of terminal and non-terminal boundaries. For each parameter, we extracted a series of measurements: 6 for speech rate and rhythm; 34 for duration; 65 for fundamental frequency; 4 for intensity and 2 measurements related to pause. These parameters were extracted from spontaneous speech fragments that were previously segmented into tone units, manually performed by 14 human annotators. We used two methods of statistical classification, Random Forest (RF) and Linear Discriminant Analysis (LDA), to generate models for the identification of prosodic boundaries. After several phases of training and testing, both methods were relatively successful in identifying terminal and non-terminal boundaries. The LDA method presented a higher accuracy in the prediction of terminal and non-terminal boundaries than the RF method, therefore the model obtained with LDA was further refined. As a result, the terminal boundary model is based on 20 acoustic measurements and shows a convergence of 80% in relation to boundaries identified by annotators in the speech sample. For non-terminal boundaries, we arrived at three models that, combined, presented a convergence of 98% in relation to the boundaries identified by annotators in the sample.Keywords: speech segmentation; prosodic boundaries; spontaneous speech.Resumo: Este trabalho apresenta os resultados da análise de múltiplos parâmetros acústicos para a construção de um modelo para a segmentação automática da fala em unidades tonais. A partir da investigação da literatura, definimos conjuntos de parâmetros acústicos relacionados à identificação de fronteiras terminais e não terminais. Para cada parâmetro, uma série de medidas foram extraídas: 6 medidas de taxa de elocução e ritmo; 34 de duração; 65 de frequência fundamental; 4 de intensidade e 2 medidas relativas às pausas. Tais parâmetros foram extraídos de fragmentos de fala espontânea previamente segmentada em unidades tonais de forma manual por 14 anotadores humanos. Utilizamos dois métodos de classificação estatística, Random Forest (RF) e Linear Discriminant Analysis (LDA), para gerar modelos de identificação de fronteiras prosódicas. Após diversas fases de treinamentos e testes, ambos os métodos apresentaram sucesso relativo na identificação de fronteiras terminais e não-terminais. O método LDA apresentou maior índice de acerto na previsão de fronteiras terminais e não-terminais do que o RF, portanto, o modelo obtido com este método foi refinado. Como resultado, O modelo para as fronteiras terminais baseia-se em 20 medidas acústicas e apresenta uma convergência de 80% em relação às fronteiras identificadas pelos anotadores na amostra de fala. Para as fronteiras não terminais, chegamos a três modelos que, combinados, apresentaram uma convergência de 98% em relação às fronteiras identificadas pelos anotadores na amostra.Palavras-chave: segmentação da fala; fronteiras prosódicas; fala espontânea.

Download Full-text

Use of Linear Discriminant Analysis (LDA), K Nearest Neighbours (KNN), Decision Tree (CART), Random Forest (RF), Gaussian Naive Bayes (NB), Support Vector Machines (SVM) to Predict Admission for Post Graduation Courses

SSRN Electronic Journal ◽

10.2139/ssrn.3683065 ◽

2020 ◽

Author(s):

Ivan Rodrigues ◽

Alitta Parayil ◽

Tarun Shetty ◽

Imran Mirza

Keyword(s):

Support Vector Machines ◽

Discriminant Analysis ◽

Random Forest ◽

Decision Tree ◽

Linear Discriminant Analysis ◽

Naive Bayes ◽

Support Vector ◽

Linear Discriminant ◽

Nearest Neighbours ◽

Vector Machines

Download Full-text

Real Time Hand Gesture Recognition Using Random Forest and Linear Discriminant Analysis

Applicative 2015 on - Applicative 2015 ◽

10.1145/2814940.2814997 ◽

2015 ◽

Author(s):

Sangjun O. ◽

Rammohan Mallipeddi ◽

Minho Lee

Keyword(s):

Discriminant Analysis ◽

Random Forest ◽

Linear Discriminant Analysis ◽

Real Time ◽

Gesture Recognition ◽

Hand Gesture Recognition ◽

Hand Gesture ◽

Linear Discriminant

Download Full-text

ECG Signal Classification using Support Vector Machine and Linear Discriminant Analysis

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.17201725 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1720-1725

Author(s):

S. Grover ◽

Shailja .

Keyword(s):

Support Vector Machine ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Signal Classification ◽

Support Vector ◽

Ecg Signal ◽

Linear Discriminant

Download Full-text

Feature selection based on linear discriminant analysis

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02781 ◽

2009 ◽

Vol 29 (10) ◽

pp. 2781-2785

Author(s):

Zi-feng CUI ◽

Xiao-hua JI

Keyword(s):

Feature Selection ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Linear Discriminant

Download Full-text