scholarly journals Classification of Full Text Biomedical Documents: Sections Importance Assessment

2021 ◽  
Vol 11 (6) ◽  
pp. 2674
Author(s):  
Carlos Adriano Oliveira Gonçalves ◽  
Rui Camacho ◽  
Célia Talma Gonçalves ◽  
Adrián Seara Seara Vieira ◽  
Lourdes Borrajo Borrajo Diz ◽  
...  

The exponential growth of documents in the web makes it very hard for researchers to be aware of the relevant work being done within the scientific community. The task of efficiently retrieving information has therefore become an important research topic. The objective of this study is to test how the efficiency of the text classification changes if different weights are previously assigned to the sections that compose the documents. The proposal takes into account the place (section) where terms are located in the document, and each section has a weight that can be modified depending on the corpus. To carry out the study, an extended version of the OHSUMED corpus with full documents have been created. Through the use of WEKA, we compared the use of abstracts only with that of full texts, as well as the use of section weighing combinations to assess their significance in the scientific article classification process using the SMO (Sequential Minimal Optimization), the WEKA Support Vector Machine (SVM) algorithm implementation. The experimental results show that the proposed combinations of the preprocessing techniques and feature selection achieve promising results for the task of full text scientific document classification. We also have evidence to conclude that enriched datasets with text from certain sections achieve better results than using only titles and abstracts.

2016 ◽  
Vol 14 (06) ◽  
pp. 1650033 ◽  
Author(s):  
Li Gu ◽  
Lichun Xue ◽  
Qi Song ◽  
Fengji Wang ◽  
Huaqin He ◽  
...  

During commercial transactions, the quality of flue-cured tobacco leaves must be characterized efficiently, and the evaluation system should be easily transferable across different traders. However, there are over 3000 chemical compounds in flue-cured tobacco leaves; thus, it is impossible to evaluate the quality of flue-cured tobacco leaves using all the chemical compounds. In this paper, we used Support Vector Machine (SVM) algorithm together with 22 chemical compounds selected by ReliefF-Particle Swarm Optimization (R-PSO) to classify the fragrant style of flue-cured tobacco leaves, where the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) were 90.95% and 0.80, respectively. SVM algorithm combined with 19 chemical compounds selected by R-PSO achieved the best assessment performance of the aromatic quality of tobacco leaves, where the PCC and MSE were 0.594 and 0.263, respectively. Finally, we constructed two online tools to classify the fragrant style and evaluate the aromatic quality of flue-cured tobacco leaf samples. These tools can be accessed at http://bioinformatics.fafu.edu.cn/tobacco .


2014 ◽  
Vol 548-549 ◽  
pp. 1265-1269
Author(s):  
Yun Sik Hwang ◽  
Byeong Joo Jun ◽  
Tae Seon Yoon

As the stage of bioinformatics has been upgraded, classification of certain pathogen has been improved into a new manner. The main topic of this research is genetic singularity of HCV (Hepatitis C Virus) and our objective is to assay features of the HCV's amino acid under usage of Support Vector Machine (SVM) algorithm. HCV data used in our experiment has 10 kinds of sequences and 257 kinds of data. According to data analysis, some peculiar genetic patterns of HCV’s linearity that discord pre-existing neural network and C5.0 were found.


2020 ◽  
pp. 466-470
Author(s):  
Pandiyan P ◽  
Rajasekaran T ◽  
Vishnu Kumar K ◽  
Sivaramakrishnan R ◽  
Thigarajan T

This paper presents classification of fish species using support vector machine (SVM) algorithm with four kernel functions such as linear, polynomial, sigmoid and radial basis functions. The datasets for performing this research is obtained from Fish-Pak website which has required number of images for classifying the two different fish species namely Catla and Rohu with three fish features like head, body and scale data. The number of images for Rohu fish species is not equal to the Catla type fish species therefore image augmentation technique is used to balance the number of images. The simulation results reveal that SVM with radial basis function-based kernel provides the accuracy of 78 %.


2020 ◽  
Author(s):  
Chao Yin ◽  
Xiaohua Deng ◽  
Zhiqiang Yu ◽  
Ruting Chen ◽  
Hongxiang Zhong ◽  
...  

Abstract Background: During the biomass-to-bio-oil conversion process, many researches focus on the study of the association between the biomass and the bio-products by using near infrared spectra (NIR) and chemical analysis method. However, the characterization of biomass pyrolysis behaviors by using thermogravimetric analysis (TGA) with support vector machine (SVM) algorithm has not been reported. In this study, tobacco was chosen as the object for biomass, because the cigarette smoke (including water, tar and gases) released by tobacco pyrolysis reactions decide the sensory quality, which is similar to the use of biomass as a renewable resource through the pyrolysis process. Results: Support vector machine (SVM) has been employed to automatically classify the planting area and growing position of tobacco leaves by using thermogravimetric analysis data as the information source for the first time. 88 single-grade tobacco samples belonging to 4 grades and 8 categories were split into the training, validation and blind testing set. Our model showed excellent performances in both the training and validation set as well as in the blind test, with accuracy over 91.67%. Throughout the whole dataset of 88 samples, our model not only provides precise results on the planting area of tobacco leave, but also accurately distinguishes the major grades among the upper, lower and middle positions. Error only occurs in the classification of subgrades of the middle position. Conclusions: Our results not only validated the feasibility of using thermogravimetric analysis with SVM algorithm as an objective and rapid method for automatic classification of tobacco planting area and growing position, but also showed this new analysis method would be a promising way to exploring bio-oil quality prior to biomass pyrolysis production.


2020 ◽  
Vol 498 (2) ◽  
pp. 1750-1764
Author(s):  
B Arsioli ◽  
P Dedin

ABSTRACT The study of machine learning (ML) techniques for the autonomous classification of astrophysical sources is of great interest, and we explore its applications in the context of a multifrequency data-frame. We test the use of supervised ML to classify blazars according to its synchrotron peak frequency, either lower or higher than 1015 Hz. We select a sample with 4178 blazars labelled as 1279 high synchrotron peak (HSP: $\rm \nu$-peak > 1015 Hz) and 2899 low synchrotron peak (LSP: $\rm \nu$-peak < 1015 Hz). A set of multifrequency features were defined to represent each source that includes spectral slopes ($\alpha _{\nu _1, \nu _2}$) between the radio, infra-red, optical, and X-ray bands, also considering IR colours. We describe the optimization of five ML classification algorithms that classify blazars into LSP or HSP: Random forests (RFs), support vector machine (SVM), K-nearest neighbours (KNN), Gaussian Naive Bayes (GNB), and the Ludwig auto-ML framework. In our particular case, the SVM algorithm had the best performance, reaching 93 per cent of balanced accuracy. A joint-feature permutation test revealed that the spectral slopes alpha-radio-infrared (IR) and alpha-radio-optical are the most relevant for the ML modelling, followed by the IR colours. This work shows that ML algorithms can distinguish multifrequency spectral characteristics and handle the classification of blazars into LSPs and HSPs. It is a hint for the potential use of ML for the autonomous determination of broadband spectral parameters (as the synchrotron ν-peak), or even to search for new blazars in all-sky data bases.


2020 ◽  
Vol 8 (6) ◽  
pp. 3363-3367

This paper presents to create a centralized alumni network for betterment of institutions and upcoming student’s community. The System is able to collect and store alumni information for future communication. Former students of institution can communicate with their immediate friends as well as forthcoming students and various members involved in the institution community. Apart from the alumni, the institutions/organization also benefitted when sustains this network. This single system can satisfy almost every requirement of the alumni. Usually, alumni associations are organized in colleges, but may also be organized in a place where the alumni can meet each other. Despite the fact that there are many existing systems in colleges to maintain the alumni information, they are manual and more time consuming to current students to reach out their alumni and maintaining the privacy of the alumni. To overcome these issues, we proposed a web based application which allows alumni to update their information and students can connect with them and can view the filtered events posted by alumni and admin through Support Vector Machine algorithm (SVM). Proposed method, SVM algorithm used to classify the alumni members and their posting message from others in this community.


Electro cardiogram (ECG) signals records the vital information about the condition of heart of an individual. In this paper, we are aiming at preparing a model for classification of different types of heart arrhythmia. The MIT-BIH public database for heart arrhythmia has been used in the case of study. There are basically thirteen types of heart arrhythmia. The Principal Component Analysis (PCA) algorithm has been used to collect various important features of heart beats from an ECG signal. Then these features are trained and tested under Support Vector Machine (SVM) algorithm to classify the thirteen classes of heart arrhythmia. In the paper the proposed algorithm has been discussed and the outcome results have been validated. The result shows that the accuracy of our classifier in our research work is more than 91% in most of the cases.


2014 ◽  
Vol 926-930 ◽  
pp. 2996-2999
Author(s):  
Zhen Zhen Wang ◽  
Xiao Jun Tong ◽  
Shan Zeng

For locally linear embedding (LLE) algorithm of the shortcoming, an improved distance algorithm LLE is proposed, in locally linear embedding algorithm the distribution of sample component is different and the Euclidean distance can’t reflect sample distance actually. In the experiment, a sample of 231 neurons is obtained, and the morphological parameters of neurons are calculated firstly. Second, the improved locally linear embedding algorithm is used to reduce data dimensionality. Finally, support vector machine (SVM) algorithm is used to train and test samples. Experimental results show under certain conditions the classification of the method has good classification.


Author(s):  
Zhao Tong ◽  
Zheng Xiao ◽  
Hong Liu ◽  
Ming Chen

Emotional analysis can be considered as a kind of classification of sentiment polarity in essence. Against the background of mass data processing, in order to increase the accuracy of judgment on the emotion conveyed by a text, a method to classify the emotional tendency of a text that combines Latent Semantic Analysis (LSA) and Support Vector Machine (SVM) is proposed herein. By this method, a semantic distance vector space modal of “word-document” is developed from semantic aspect following the method of LSA. Then, with the help of SVM that is featured by high classification accuracy and good generalization ability, the emotion is classified. At last, this paper proposed a parallel implementation of LSA-SVM algorithm. The algorithm is developed using Message Passing Interface (MPI) in parallel environment. Experiments show that the accuracy of this method is higher than that of the conventional SVM method in the Blog assessment where sentences are short and emotional tendency is evident, the classification accuracy in a test set approximates to 92.2%, and compared with the serial implementation, the parallel LSA-SVM algorithm increases efficiency significantly.


Sign in / Sign up

Export Citation Format

Share Document