Sentence Semantic Matching Based on 3D CNN for Human–Robot Language Interaction

Wenpeng Lu; Rui Yu; Shoujin Wang; Can Wang; Ping Jian; Heyan Huang

doi:10.1145/3450520

Sentence Semantic Matching Based on 3D CNN for Human–Robot Language Interaction

ACM Transactions on Internet Technology ◽

10.1145/3450520 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-24

Author(s):

Wenpeng Lu ◽

Rui Yu ◽

Shoujin Wang ◽

Can Wang ◽

Ping Jian ◽

...

Keyword(s):

Natural Language ◽

Semantic Representation ◽

Semantic Feature ◽

Cognitive Robotics ◽

Sigmoid Function ◽

Semantic Matching ◽

Neural Models ◽

Real World Datasets ◽

Complex Features ◽

3D Cnn

The development of cognitive robotics brings an attractive scenario where humans and robots cooperate to accomplish specific tasks. To facilitate this scenario, cognitive robots are expected to have the ability to interact with humans with natural language, which depends on natural language understanding ( NLU ) technologies. As one core task in NLU, sentence semantic matching ( SSM ) has widely existed in various interaction scenarios. Recently, deep learning–based methods for SSM have become predominant due to their outstanding performance. However, each sentence consists of a sequence of words, and it is usually viewed as one-dimensional ( 1D ) text, leading to the existing available neural models being restricted into 1D sequential networks. A few researches attempt to explore the potential of 2D or 3D neural models in text representation. However, it is hard for their works to capture the complex features in texts, and thus the achieved performance improvement is quite limited. To tackle this challenge, we devise a novel 3D CNN-based SSM ( 3DSSM ) method for human–robot language interaction. Specifically, first, a specific architecture called feature cube network is designed to transform a 1D sentence into a multi-dimensional representation named as semantic feature cube. Then, a 3D CNN module is employed to learn a semantic representation for the semantic feature cube by capturing both the local features embedded in word representations and the sequential information among successive words in a sentence. Given a pair of sentences, their representations are concatenated together to feed into another 3D CNN to capture the interactive features between them to generate the final matching representation. Finally, the semantic matching degree is judged with the sigmoid function by taking the learned matching representation as the input. Extensive experiments on two real-world datasets demonstrate that 3DSSM is able to achieve comparable or even better performance over the state-of-the-art competing methods.

Download Full-text

Single-shot Semantic Matching Network for Moment Localization in Videos

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3441577 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-14

Author(s):

Xinfang Liu ◽

Xiushan Nie ◽

Junya Teng ◽

Li Lian ◽

Yilong Yin

Keyword(s):

Natural Language ◽

Fixed Number ◽

Single Shot ◽

Semantic Matching ◽

Matching Network ◽

Attention Model ◽

Semantic Relationships ◽

Natural Language Query ◽

Benchmark Datasets ◽

Short Memory

Moment localization in videos using natural language refers to finding the most relevant segment from videos given a natural language query. Most of the existing methods require video segment candidates for further matching with the query, which leads to extra computational costs, and they may also not locate the relevant moments under any length evaluated. To address these issues, we present a lightweight single-shot semantic matching network (SSMN) to avoid the complex computations required to match the query and the segment candidates, and the proposed SSMN can locate moments of any length theoretically. Using the proposed SSMN, video features are first uniformly sampled to a fixed number, while the query sentence features are generated and enhanced by GloVe, long-term short memory (LSTM), and soft-attention modules. Subsequently, the video features and sentence features are fed to an enhanced cross-modal attention model to mine the semantic relationships between vision and language. Finally, a score predictor and a location predictor are designed to locate the start and stop indexes of the query moment. We evaluate the proposed method on two benchmark datasets and the experimental results demonstrate that SSMN outperforms state-of-the-art methods in both precision and efficiency.

Download Full-text

Unsupervised Anomaly Detection with Distillated Teacher-Student Network Ensemble

Entropy ◽

10.3390/e23020201 ◽

2021 ◽

Vol 23 (2) ◽

pp. 201

Author(s):

Qinfeng Xiao ◽

Jing Wang ◽

Youfang Lin ◽

Wenbo Gongsa ◽

Ganghui Hu ◽

...

Keyword(s):

Anomaly Detection ◽

Multivariate Data ◽

Failure Detection ◽

Superior Performance ◽

Detection Algorithms ◽

Teacher Student ◽

Model Complex ◽

Unsupervised Anomaly Detection ◽

Real World Datasets ◽

Complex Features

We address the problem of unsupervised anomaly detection for multivariate data. Traditional machine learning based anomaly detection algorithms rely on specific assumptions of normal patterns and fail to model complex feature interactions and relations. Recently, existing deep learning based methods are promising for extracting representations from complex features. These methods train an auxiliary task, e.g., reconstruction and prediction, on normal samples. They further assume that anomalies fail to perform well on the auxiliary task since they are never trained during the model optimization. However, the assumption does not always hold in practice. Deep models may also perform the auxiliary task well on anomalous samples, leading to the failure detection of anomalies. To effectively detect anomalies for multivariate data, this paper introduces a teacher-student distillation based framework Distillated Teacher-Student Network Ensemble (DTSNE). The paradigm of the teacher-student distillation is able to deal with high-dimensional complex features. In addition, an ensemble of student networks provides a better capability to avoid generalizing the auxiliary task performance on anomalous samples. To validate the effectiveness of our model, we conduct extensive experiments on real-world datasets. Experimental results show superior performance of DTSNE over competing methods. Analysis and discussion towards the behavior of our model are also provided in the experiment section.

Download Full-text

Natural Language Semantic Representation Method Based on the Scene Framework

2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence ◽

10.1145/3446132.3446394 ◽

2020 ◽

Author(s):

Ping Zhu

Keyword(s):

Natural Language ◽

Semantic Representation ◽

Natural Language Semantic ◽

Representation Method

Download Full-text

DRr-Net: Dynamic Re-Read Network for Sentence Semantic Matching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017442 ◽

2019 ◽

Vol 33 ◽

pp. 7442-7449 ◽

Cited By ~ 3

Author(s):

Kun Zhang ◽

Guangyi Lv ◽

Linyuan Wang ◽

Le Wu ◽

Enhong Chen ◽

...

Keyword(s):

Natural Language ◽

Small Region ◽

Psychological Research ◽

Attention Mechanism ◽

Semantic Relations ◽

Semantic Matching ◽

Close Attention ◽

Original Sentence ◽

Benchmark Datasets ◽

Sentence Matching

Sentence semantic matching requires an agent to determine the semantic relation between two sentences, which is widely used in various natural language tasks such as Natural Language Inference (NLI) and Paraphrase Identification (PI). Among all matching methods, attention mechanism plays an important role in capturing the semantic relations and properly aligning the elements of two sentences. Previous methods utilized attention mechanism to select important parts of sentences at one time. However, the important parts of the sentence during semantic matching are dynamically changing with the degree of sentence understanding. Selecting the important parts at one time may be insufficient for semantic understanding. To this end, we propose a Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding. To be specific, we first employ Attention Stack-GRU (ASG) unit to model the original sentence repeatedly and preserve all the information from bottom-most word embedding input to up-most recurrent output. Second, we utilize Dynamic Re-read (DRr) unit to pay close attention to one important word at one time with the consideration of learned information and re-read the important words for better sentence semantic understanding. Extensive experiments on three sentence matching benchmark datasets demonstrate that DRr-Net has the ability to model sentence semantic more precisely and significantly improve the performance of sentence semantic matching. In addition, it is very interesting that some of finding in our experiments are consistent with the findings of psychological research.

Download Full-text

Sentence Generation for Entity Description with Content-Plan Attention

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6439 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9057-9064

Author(s):

Bayu Trisedya ◽

Jianzhong Qi ◽

Rui Zhang

Keyword(s):

State Of The Art ◽

Neural Models ◽

Time Step ◽

Two Stage ◽

Sentence Generation ◽

Neural Data ◽

Attention Model ◽

Linear Sequence ◽

Proper Order ◽

Real World Datasets

We study neural data-to-text generation. Specifically, we consider a target entity that is associated with a set of attributes. We aim to generate a sentence to describe the target entity. Previous studies use encoder-decoder frameworks where the encoder treats the input as a linear sequence and uses LSTM to encode the sequence. However, linearizing a set of attributes may not yield the proper order of the attributes, and hence leads the encoder to produce an improper context to generate a description. To handle disordered input, recent studies propose two-stage neural models that use pointer networks to generate a content-plan (i.e., content-planner) and use the content-plan as input for an encoder-decoder model (i.e., text generator). However, in two-stage models, the content-planner may yield an incomplete content-plan, due to missing one or more salient attributes in the generated content-plan. This will in turn cause the text generator to generate an incomplete description. To address these problems, we propose a novel attention model that exploits content-plan to highlight salient attributes in a proper order. The challenge of integrating a content-plan in the attention model of an encoder-decoder framework is to align the content-plan and the generated description. We handle this problem by devising a coverage mechanism to track the extent to which the content-plan is exposed in the previous decoding time-step, and hence it helps our proposed attention model select the attributes to be mentioned in the description in a proper order. Experimental results show that our model outperforms state-of-the-art baselines by up to 3% and 5% in terms of BLEU score on two real-world datasets, respectively.

Download Full-text

Effective Deep Memory Networks for Distant Supervised Relation Extraction

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/559 ◽

2017 ◽

Cited By ~ 11

Author(s):

Xiaocheng Feng ◽

Jiang Guo ◽

Bing Qin ◽

Ting Liu ◽

Yongjie Liu

Keyword(s):

Relation Extraction ◽

Training Data ◽

Context Word ◽

Neural Models ◽

Attention Model ◽

Memory Experiment ◽

Feature Based ◽

Real World Datasets ◽

Major Attention ◽

Better Than

Distant supervised relation extraction (RE) has been an effective way of finding novel relational facts from text without labeled training data. Typically it can be formalized as a multi-instance multi-label problem.In this paper, we introduce a novel neural approach for distant supervised (RE) with specific focus on attention mechanisms.Unlike the feature-based logistic regression model and compositional neural models such as CNN, our approach includes two major attention-based memory components, which is capable of explicitly capturing the importance of each context word for modeling the representation of the entity pair, as well as the intrinsic dependencies between relations.Such importance degree and dependency relationship are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiment on real-world datasets shows that our approach performs significantly and consistently better than various baselines.

Download Full-text

Natural Language Generation: Recently Learned Lessons, Directions for Semantic Representation-based Approaches, and the Case of Brazilian Portuguese Language

10.18653/v1/p19-2011 ◽

2019 ◽

Author(s):

Marco Antonio Sobrevilla Cabezudo ◽

Thiago Pardo

Keyword(s):

Natural Language ◽

Semantic Representation ◽

Natural Language Generation ◽

Brazilian Portuguese ◽

Language Generation

Download Full-text

Welcome to AI Matters 7(2)

AI Matters ◽

10.1145/3478369.3478370 ◽

2021 ◽

Vol 7 (2) ◽

pp. 3-4

Author(s):

Iolanda Leite ◽

Anuj Karpatne

Keyword(s):

Natural Language ◽

Regular Education ◽

Neural Models ◽

Natural Language Interfaces ◽

Algorithmic Bias

Welcome to the second issue of this year's AI Matters Newsletter. We start with a report on upcoming SIGAI Events by Dilini Samarasinghe and Conference reports by Louise Dennis, our conference coordination officer. In our regular Education column, Carolyn Rosé discusses the role of AI in education in a post-pandemic reality. We then bring you our regular Policy column, where Larry Medsker covers interesting and timely discussions on AI policy, for example whether governments should play a role in reducing algorithmic bias. This issue closes with an article contribution from Li Dong, one of the runner-ups in the latest AAIS/SIGAI dissertation award, on the use neural models to build natural language interfaces.

Download Full-text

Computational construction grammar for visual question answering

Linguistics Vanguard ◽

10.1515/lingvan-2018-0070 ◽

2019 ◽

Vol 5 (1) ◽

Author(s):

Jens Nevens ◽

Paul Van Eecke ◽

Katrien Beuls

Keyword(s):

Natural Language ◽

Question Answering ◽

Semantic Representation ◽

Construction Grammar ◽

Training Data ◽

Knowledge Sources ◽

Visual Question Answering ◽

Novel Approach ◽

Natural Language Question ◽

Grammar Model

AbstractIn order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.

Download Full-text