scholarly journals Protein Subnuclear Localization Based on Radius-SMOTE and Kernel Linear Discriminant Analysis Combined with Random Forest

Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1566
Author(s):  
Liwen Wu ◽  
Shanshan Huang ◽  
Feng Wu ◽  
Qian Jiang ◽  
Shaowen Yao ◽  
...  

Protein subnuclear localization plays an important role in proteomics, and can help researchers to understand the biologic functions of nucleus. To date, most protein datasets used by studies are unbalanced, which reduces the prediction accuracy of protein subnuclear localization—especially for the minority classes. In this work, a novel method is therefore proposed to predict the protein subnuclear localization of unbalanced datasets. First, the position-specific score matrix is used to extract the feature vectors of two benchmark datasets and then the useful features are selected by kernel linear discriminant analysis. Second, the Radius-SMOTE is used to expand the samples of minority classes to deal with the problem of imbalance in datasets. Finally, the optimal feature vectors of the expanded datasets are classified by random forest. In order to evaluate the performance of the proposed method, four index evolutions are calculated by Jackknife test. The results indicate that the proposed method can achieve better effect compared with other conventional methods, and it can also improve the accuracy for both majority and minority classes effectively.

Author(s):  
Rong-Hua Li ◽  
Shuang Liang ◽  
George Baciu ◽  
Eddie Chan

Singularity problems of scatter matrices in Linear Discriminant Analysis (LDA) are challenging and have obtained attention during the last decade. Linear Discriminant Analysis via QR decomposition (LDA/QR) and Direct Linear Discriminant analysis (DLDA) are two popular algorithms to solve the singularity problem. This paper establishes the equivalent relationship between LDA/QR and DLDA. They can be regarded as special cases of pseudo-inverse LDA. Similar to LDA/QR algorithm, DLDA can also be considered as a two-stage LDA method. Interestingly, the first stage of DLDA can act as a dimension reduction algorithm. The experiment compares LDA/QR and DLDA algorithms in terms of classification accuracy, computational complexity on several benchmark datasets and compares their first stages. The results confirm the established equivalent relationship and verify their capabilities in dimension reduction.


Author(s):  
Rong-Hua Li ◽  
Shuang Liang ◽  
George Baciu ◽  
Eddie Chan

Singularity problems of scatter matrices in Linear Discriminant Analysis (LDA) are challenging and have obtained attention during the last decade. Linear Discriminant Analysis via QR decomposition (LDA/QR) and Direct Linear Discriminant analysis (DLDA) are two popular algorithms to solve the singularity problem. This paper establishes the equivalent relationship between LDA/QR and DLDA. They can be regarded as special cases of pseudo-inverse LDA. Similar to LDA/QR algorithm, DLDA can also be considered as a two-stage LDA method. Interestingly, the first stage of DLDA can act as a dimension reduction algorithm. The experiment compares LDA/QR and DLDA algorithms in terms of classification accuracy, computational complexity on several benchmark datasets and compares their first stages. The results confirm the established equivalent relationship and verify their capabilities in dimension reduction.


2018 ◽  
Vol 26 (4) ◽  
pp. 1455 ◽  
Author(s):  
Bárbara Helohá Falcão Teixeira ◽  
Maryualê Malvessi Mittmann

Abstract: This work presents the results of the analysis of multiple acoustic parameters for the construction of a model for the automatic segmentation of speech in tone units. Based on literature review, we defined sets of acoustic parameters related to the signalization of terminal and non-terminal boundaries. For each parameter, we extracted a series of measurements: 6 for speech rate and rhythm; 34 for duration; 65 for fundamental frequency; 4 for intensity and 2 measurements related to pause. These parameters were extracted from spontaneous speech fragments that were previously segmented into tone units, manually performed by 14 human annotators. We used two methods of statistical classification, Random Forest (RF) and Linear Discriminant Analysis (LDA), to generate models for the identification of prosodic boundaries. After several phases of training and testing, both methods were relatively successful in identifying terminal and non-terminal boundaries. The LDA method presented a higher accuracy in the prediction of terminal and non-terminal boundaries than the RF method, therefore the model obtained with LDA was further refined. As a result, the terminal boundary model is based on 20 acoustic measurements and shows a convergence of 80% in relation to boundaries identified by annotators in the speech sample. For non-terminal boundaries, we arrived at three models that, combined, presented a convergence of 98% in relation to the boundaries identified by annotators in the sample.Keywords: speech segmentation; prosodic boundaries; spontaneous speech.Resumo: Este trabalho apresenta os resultados da análise de múltiplos parâmetros acústicos para a construção de um modelo para a segmentação automática da fala em unidades tonais. A partir da investigação da literatura, definimos conjuntos de parâmetros acústicos relacionados à identificação de fronteiras terminais e não terminais. Para cada parâmetro, uma série de medidas foram extraídas: 6 medidas de taxa de elocução e ritmo; 34 de duração; 65 de frequência fundamental; 4 de intensidade e 2 medidas relativas às pausas. Tais parâmetros foram extraídos de fragmentos de fala espontânea previamente segmentada em unidades tonais de forma manual por 14 anotadores humanos. Utilizamos dois métodos de classificação estatística, Random Forest (RF) e Linear Discriminant Analysis (LDA), para gerar modelos de identificação de fronteiras prosódicas. Após diversas fases de treinamentos e testes, ambos os métodos apresentaram sucesso relativo na identificação de fronteiras terminais e não-terminais. O método LDA apresentou maior índice de acerto na previsão de fronteiras terminais e não-terminais do que o RF, portanto, o modelo obtido com este método foi refinado. Como resultado, O modelo para as fronteiras terminais baseia-se em 20 medidas acústicas e apresenta uma convergência de 80% em relação às fronteiras identificadas pelos anotadores na amostra de fala. Para as fronteiras não terminais, chegamos a três modelos que, combinados, apresentaram uma convergência de 98% em relação às fronteiras identificadas pelos anotadores na amostra.Palavras-chave: segmentação da fala; fronteiras prosódicas; fala espontânea.


Sign in / Sign up

Export Citation Format

Share Document