scholarly journals Exploiting Syntactic and Semantic Information for Textual Similarity Estimation

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jiajia Luo ◽  
Hongtao Shan ◽  
Gaoyu Zhang ◽  
George Yuan ◽  
Shuyi Zhang ◽  
...  

The textual similarity task, which measures the similarity between two text pieces, has recently received much attention in the natural language processing (NLP) domain. However, due to the vagueness and diversity of language expression, only considering semantic or syntactic features, respectively, may cause the loss of critical textual knowledge. This paper proposes a new type of structure tree for sentence representation, which exploits both syntactic (structural) and semantic information known as the weight vector dependency tree (WVD-tree). WVD-tree comprises structure trees with syntactic information along with word vectors representing semantic information of the sentences. Further, Gaussian attention weight is proposed for better capturing important semantic features of sentences. Meanwhile, we design an enhanced tree kernel to calculate the common parts between two structures for similarity judgment. Finally, WVD-tree is tested on widely used semantic textual similarity tasks. The experimental results prove that WVD-tree can effectively improve the accuracy of sentence similarity judgments.

2021 ◽  
Vol 11 (4) ◽  
pp. 1377
Author(s):  
Jun Long ◽  
Ye Wang ◽  
Xiangxiang Wei ◽  
Zhen Ding ◽  
Qianqian Qi ◽  
...  

Relation classification is an important task in the field of natural language processing, and it is one of the important steps in constructing a knowledge graph, which can greatly reduce the cost of constructing a knowledge graph. The Graph Convolutional Network (GCN) is an effective model for accurate relation classification, which models the dependency tree of textual instances to extract the semantic features of relation mentions. Previous GCN based methods treat each node equally. However, the contribution of different words to express a certain relation is different, especially the entity mentions in the sentence. In this paper, a novel GCN based relation classifier is propose, which treats the entity nodes as two global nodes in the dependency tree. These two global nodes directly connect with other nodes, which can aggregate information from the whole tree with only one convolutional layer. In this way, the method can not only simplify the complexity of the model, but also generate expressive relation representation. Experimental results on two widely used data sets, SemEval-2010 Task 8 and TACRED, show that our model outperforms all the compared baselines in this paper, which illustrates that the model can effectively utilize the dependencies between nodes and improve the performance of relation classification.


2020 ◽  
Vol 11 (SPL1) ◽  
pp. 907-912
Author(s):  
Deepika Masurkar ◽  
Priyanka Jaiswal

Recently at the end of 2019, a new disease was found in Wuhan, China. This disease was diagnosed to be caused by a new type of coronavirus and affected almost the whole world. Chinese researchers named this novel virus as 2019-nCov or Wuhan-coronavirus. However, to avoid misunderstanding the World Health Organization noises it as COVID-19 virus when interacting with the media COVID-19 is new globally as well as in India. This has disturbed peoples mind. There are various rumours about the coronavirus in Indian society which causes panic in peoples mind. It is the need of society to know myths and facts about coronavirus to reduce the panic and take the proper precautionary actions for our safety against the coronavirus. Thus this article aims to bust myths and present the facts to the common people. We need to verify myths spreading through social media and keep our self-ready with facts so that we can protect our self in a better way. People must prevent COVID 19 at a personal level. Appropriate action in individual communities and countries can benefit the entire world.


Author(s):  
Yong Li ◽  
Xiaojun Yang ◽  
Min Zuo ◽  
Qingyu Jin ◽  
Haisheng Li ◽  
...  

The real-time and dissemination characteristics of network information make net-mediated public opinion become more and more important food safety early warning resources, but the data of petabyte (PB) scale growth also bring great difficulties to the research and judgment of network public opinion, especially how to extract the event role of network public opinion from these data and analyze the sentiment tendency of public opinion comment. First, this article takes the public opinion of food safety network as the research point, and a BLSTM-CRF model for automatically marking the role of event is proposed by combining BLSTM and conditional random field organically. Second, the Attention mechanism based on vocabulary in the field of food safety is introduced, the distance-related sequence semantic features are extracted by BLSTM, and the emotional classification of sequence semantic features is realized by using CNN. A kind of Att-BLSTM-CNN model for the analysis of public opinion and emotional tendency in the field of food safety is proposed. Finally, based on the time series, this article combines the role extraction of food safety events and the analysis of emotional tendency and constructs a net-mediated public opinion early warning model in the field of food safety according to the heat of the event and the emotional intensity of the public to food safety public opinion events.


Author(s):  
Shumin Shi ◽  
Dan Luo ◽  
Xing Wu ◽  
Congjun Long ◽  
Heyan Huang

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.


2021 ◽  
pp. 1-10
Author(s):  
Hye-Jeong Song ◽  
Tak-Sung Heo ◽  
Jong-Dae Kim ◽  
Chan-Young Park ◽  
Yu-Seop Kim

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.


Author(s):  
Zhang Hong ◽  
Wei-qing Cao ◽  
Ting Li Yang ◽  
Jin Kui Chu

Abstract This paper is the second of a series of two papers which designed a new type of load balancing mechanisms for planetary gearings with arbitrary number of planets. In this paper the common expression of the non-uniform load share factor was deduced, and a function parameter:force-arm factor and their solution was given. That makes it possible that the dimensions and the ability of load equilibrium of Multi-Link Load Balancing Mechanisms can be determined. The criteria of optimum load balancing Mechanisms selection were set up with consider of the effects of turning pair clearances, and optimum mechanisms were selected among the 15 candidates obtained in Part 1. Finally, it was demonstrated that the optimum multi-link load balancing mechanisms for arbitrary number of planets had the similar topological structures and same function and performence of load equilibrium.


Author(s):  
Vilson J. Leffa

A typical problem in the resolution of pronominal anaphora is the presence of more than one candidate for the antecedent of the pronoun. Considering two English sentences like (1) "People buy expensive cars because they offer more status" and (2) "People buy expensive cars because they want more status" we can see that the two NPs "people" and "expensive cars", from a purely syntactic perspective, are both legitimate candidates as antecedents for the pronoun "they". This problem has been traditionally solved by using world knowledge (e.g. schema theory), where, through an internal representation of the world, we "know" that cars "offer" status and people "want" status. The assumption in this paper is that the use of world knowledge does not explain how the disambiguation process works and alternative explanations should be explored. Using a knowledge poor approach (explicit information from the text rather than implicit world knowledge) the study investigates to what extent syntactic and semantic constraints can be used to resolve anaphora. For this purpose, 1,400 examples of the word "they" were randomly selected from a corpus of 10,000,000 words of expository text in English. Antecedent candidates for each case were then analyzed and classified in terms of their syntactic functions in the sentence (subject, object, etc.) and semantic features (+ human, + animate, etc.). It was found that syntactic constraints resolved 85% of the cases. When combined with semantic constraints the resolution rate rose to 98%. The implications of the findings for Natural Language Processing are discussed.


2014 ◽  
Vol 496-500 ◽  
pp. 1044-1047 ◽  
Author(s):  
Jia Xiong Zhu ◽  
Qiang Tong

In this paper, based on the principle of induction heating and eddy current fields working, the equivalent circuit model of induction cooker heating was proposed and analyzed. By using the finite element numerical analysis software, the optimization design and model simulations of induction cooker were implemented. Finally, a new design of induction cooker with the uniform heating-coil was proposed, and the magnetic circuit modeling and simulation of new type induction cooker heating was carried out, the eddy current loss density distribution in the common wire coil and the improved wire coil was separately calculated and compared. The results shown the new wire coil had higher energy efficiency and more uniform heating characteristics, which provided a reference for the future optimization design and development of induction cooker.


Author(s):  
W. J. Xie ◽  
L. Zhang ◽  
H. P. Chen

Abstract. GlobeLand30 update data is one of the important products of the Construction Maintenance and Update of Geographic Information Resources Project (hereinafter referred to as the Update Project). It provides important basic information for national geographical conditions monitoring, eco-environmental assessment and global geographic information integrated services. Different from the land cover classification data in National Geoinformation Surveying and Monitoring Project, GlobeLand30 update data is a completely new type of scientific research outcomes with new setup of resolution, its data format and data structure are newly set according to the requirements of the Update Project. Therefore, current inspecting methods are of limitations and incompatibility and cannot be firmly reasonable for inspecting the accuracy of GlobeLand30 update data. Combining with the practice of quality control for the Update Project, this paper proposes a set of methods and processes for the quality inspection of the GlobeLand30 update data. It also summarizes the key points of its inspection and analyses the common errors found in the actual inspection practice of the Update Project from 2017 to 2018. It can provide a certain technical reference for the quality control and quality improvement of GlobeLand30 update data in the Update Project.


Sign in / Sign up

Export Citation Format

Share Document