scholarly journals Application of the Polyhedral Conic Functions Method in the Text Classification and Comparative Analysis

2018 ◽  
Vol 2018 ◽  
pp. 1-11
Author(s):  
Nur Uylaş Satı ◽  
Burak Ordin

In direct proportion to the heavy increase of online information data, the attention to text categorization (classification) has also increased. In text categorization problem, namely, text classification, the goal is to classify the documents into predefined classes (categories or labels). Recently various methods in data mining have been experienced for text classification in literature except polyhedral conic function (PCF) methods. In this paper, PCFs are used to classify the documents. The separation algorithms via PCFs which include linear programming subproblems with inequality constraints are presented. Numerical experiments are done on real-world text datasets. Comparisons are made between state-of-the-art methods by presenting obtained tenfold cross-validation results, accuracy values, and running times in tables. The results verify that in text classification PCF methods are as effective in terms of accuracy values as state-of-the-art methods.

2019 ◽  
Vol 14 (4) ◽  
pp. 333-343 ◽  
Author(s):  
Linai Kuang ◽  
Haochen Zhao ◽  
Lei Wang ◽  
Zhanwei Xuan ◽  
Tingrui Pei

Background: In recent years, more evidence have progressively indicated that Long non-coding RNAs (lncRNAs) play vital roles in wide-ranging human diseases, which can serve as potential biomarkers and drug targets. Comparing with vast lncRNAs being found, the relationships between lncRNAs and diseases remain largely unknown. Objective: The prediction of novel and potential associations between lncRNAs and diseases would contribute to dissect the complex mechanisms of disease pathogenesis. associations while known disease-lncRNA associations are required only. Method: In this paper, a new computational method based on Point Cut Set is proposed to predict LncRNA-Disease Associations (PCSLDA) based on known lncRNA-disease associations. Compared with the existing state-of-the-art methods, the major novelty of PCSLDA lies in the incorporation of distance difference matrix and point cut set to set the distance correlation coefficient of nodes in the lncRNA-disease interaction network. Hence, PCSLDA can be applied to forecast potential lncRNAdisease associations while known disease-lncRNA associations are required only. Results: Simulation results show that PCSLDA can significantly outperform previous state-of-the-art methods with reliable AUC of 0.8902 in the leave-one-out cross-validation and AUCs of 0.7634 and 0.8317 in 5-fold cross-validation and 10-fold cross-validation respectively. And additionally, 70% of top 10 predicted cancer-lncRNA associations can be confirmed. Conclusion: It is anticipated that our proposed model can be a great addition to the biomedical research field.


Author(s):  
Jiali Yu ◽  
◽  
Zhiliang Qin ◽  
Linghao Lin ◽  
Yu Qin ◽  
...  

In this paper, we focus on the text classification task, which is a most import task in the area of Natural Language Processing (NLP). We propose an innovative convolutional neural network (CNN) model to perform temporal feature aggregation (TFA) effectively, which has a highly representative capacity to extract sequential features from vectorized numerical embeddings. First, we feed embedded vectors into a bi-directional LSTM (Bi-LSTM) model to capture the contextual information of each word. Afterwards, we propose to use the state-of-the-art deep-learning models as key components of the architecture, i.e., the Xception model and the WaveNet model, to extract temporal features from deep convolutional layers concurrently. To facilitate an effective feature fusion, we concatenate the outputs of two component models before forwarding to a drop-out layer to alleviate over-fitting and subsequently a fully-connected dense layer to perform the final classification of input texts. Experiments demonstrate that the proposed method achieves performance comparable to the state-of-the-art models while at a significantly lower computational complexity. Our approach obtains the cross-validation score of 95.83% for the Quora Insincere Question Classification (QIQC) dataset, and the cross-validation score of 83.10% for the Spooky Author Identification (SAI) dataset, respectively, which are among the best published results. The proposed method can be readily generalized to signal processing tasks, e.g., environmental sound classification (ESC) and machine fault analysis (MFA).


2018 ◽  
Author(s):  
João Marcos Carvalho Lima ◽  
José Everardo Bessa Maia

This paper presents an approach that uses topic models based on LDA to represent documents in text categorization problems. The document representation is achieved through the cosine similarity between document embeddings and embeddings of topic words, creating a Bag-of-Topics (BoT) variant. The performance of this approach is compared against those of two other representations: BoW (Bag-of-Words) and Topic Model, both based on standard tf-idf. Also, to reveal the effect of the classifier, we compared the performance of the nonlinear classifier SVM against that of the linear classifier Naive Bayes, taken as baseline. To evaluate the approach we use two bases, one multi-label (RCV-1) and another single-label (20 Newsgroup). The model presents significant results with low dimensionality when compared to the state of the art.


Author(s):  
Pengfei Sun ◽  
Yawen Ouyang ◽  
Wenming Zhang ◽  
Xin-yu Dai

Meta-learning has recently emerged as a promising technique to address the challenge of few-shot learning. However, standard meta-learning methods mainly focus on visual tasks, which makes it hard for them to deal with diverse text data directly. In this paper, we introduce a novel framework for few-shot text classification, which is named as MEta-learning with Data Augmentation (MEDA). MEDA is composed of two modules, a ball generator and a meta-learner, which are learned jointly. The ball generator is to increase the number of shots per class by generating more samples, so that meta-learner can be trained with both original and augmented samples. It is worth noting that ball generator is agnostic to the choice of the meta-learning methods. Experiment results show that on both datasets, MEDA outperforms existing state-of-the-art methods and significantly improves the performance of meta-learning on few-shot text classification.


Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


2018 ◽  
Vol 7 (4) ◽  
pp. 603-622 ◽  
Author(s):  
Leonardo Gutiérrez-Gómez ◽  
Jean-Charles Delvenne

Abstract Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of ‘fingerprints’ to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 325
Author(s):  
Zhihao Wu ◽  
Baopeng Zhang ◽  
Tianchen Zhou ◽  
Yan Li ◽  
Jianping Fan

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.


2021 ◽  
Vol 11 (12) ◽  
pp. 5656
Author(s):  
Yufan Zeng ◽  
Jiashan Tang

Graph neural networks (GNNs) have been very successful at solving fraud detection tasks. The GNN-based detection algorithms learn node embeddings by aggregating neighboring information. Recently, CAmouflage-REsistant GNN (CARE-GNN) is proposed, and this algorithm achieves state-of-the-art results on fraud detection tasks by dealing with relation camouflages and feature camouflages. However, stacking multiple layers in a traditional way defined by hop leads to a rapid performance drop. As the single-layer CARE-GNN cannot extract more information to fix the potential mistakes, the performance heavily relies on the only one layer. In order to avoid the case of single-layer learning, in this paper, we consider a multi-layer architecture which can form a complementary relationship with residual structure. We propose an improved algorithm named Residual Layered CARE-GNN (RLC-GNN). The new algorithm learns layer by layer progressively and corrects mistakes continuously. We choose three metrics—recall, AUC, and F1-score—to evaluate proposed algorithm. Numerical experiments are conducted. We obtain up to 5.66%, 7.72%, and 9.09% improvements in recall, AUC, and F1-score, respectively, on Yelp dataset. Moreover, we also obtain up to 3.66%, 4.27%, and 3.25% improvements in the same three metrics on the Amazon dataset.


Sign in / Sign up

Export Citation Format

Share Document