scholarly journals Vision–Language–Knowledge Co-Embedding for Visual Commonsense Reasoning

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2911
Author(s):  
JaeYun Lee ◽  
Incheol Kim

Visual commonsense reasoning is an intelligent task performed to decide the most appropriate answer to a question while providing the rationale or reason for the answer when an image, a natural language question, and candidate responses are given. For effective visual commonsense reasoning, both the knowledge acquisition problem and the multimodal alignment problem need to be solved. Therefore, we propose a novel Vision–Language–Knowledge Co-embedding (ViLaKC) model that extracts knowledge graphs relevant to the question from an external knowledge base, ConceptNet, and uses them together with the input image to answer the question. The proposed model uses a pretrained vision–language–knowledge embedding module, which co-embeds multimodal data including images, natural language texts, and knowledge graphs into a single feature vector. To reflect the structural information of the knowledge graph, the proposed model uses the graph convolutional neural network layer to embed the knowledge graph first and then uses multi-head self-attention layers to co-embed it with the image and natural language question. The effectiveness and performance of the proposed model are experimentally validated using the VCR v1.0 benchmark dataset.

2019 ◽  
Vol 8 (10) ◽  
pp. 428 ◽  
Author(s):  
Bingchuan Jiang ◽  
Liheng Tan ◽  
Yan Ren ◽  
Feng Li

The core of intelligent virtual geographical environments (VGEs) is the formal expression of geographic knowledge. Its purpose is to transform the data, information, and scenes of a virtual geographic environment into “knowledge” that can be recognized by computer, so that the computer can understand the virtual geographic environment more easily. A geographic knowledge graph (GeoKG) is a large-scale semantic web that stores geographical knowledge in a structured form. Based on a geographic knowledge base and a geospatial database, intelligent interactions with virtual geographical environments can be realized by natural language question answering, entity links, and so on. In this paper, a knowledge-enhanced Virtual geographical environments service framework is proposed. We construct a multi-level semantic parsing model and an enhanced GeoKG for structured geographic information data, such as digital maps, 3D virtual scenes, and unstructured information data. Based on the GeoKG, we propose a bilateral LSTM-CRF (long short-term memory- conditional random field) model to achieve natural language question answering for VGEs and conduct experiments on the method. The results prove that the method of intelligent interaction based on the knowledge graph can bridge the distance between people and virtual environments.


2019 ◽  
Vol 481 ◽  
pp. 141-159 ◽  
Author(s):  
Weiguo Zheng ◽  
Hong Cheng ◽  
Jeffrey Xu Yu ◽  
Lei Zou ◽  
Kangfei Zhao

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yu Zhao ◽  
Jiayue Hou ◽  
Zongjian Yu ◽  
Yun Zhang ◽  
Qing Li

Knowledge graphs (KGs) entity typing aims to predict the potential types to an entity, that is, (entity, entity type = ?). Recently, several embedding models are proposed for KG entity types prediction according to the existing typing information of the (entity, entity type) tuples in KGs. However, most of them unreasonably assume that all existing entity typing instances in KGs are completely correct, which ignore the nonnegligible entity type noises and may lead to potential errors for the downstream tasks. To address this problem, we propose ConfE, a novel confidence-aware embedding approach for modeling the (entity, entity type) tuples, which takes tuple confidence into consideration for learning better embeddings. Specifically, we learn the embeddings of entities and entity types in separate entity space and entity type space since they are different objects in KGs. We utilize an asymmetric matrix to specify the interaction of their embeddings and incorporate the tuple confidence as well. To make the tuple confidence more universal, we consider only the internal structural information in existing KGs. We evaluate our model on two tasks, including entity type noise detection and entity type prediction. The extensive experimental results in two public benchmark datasets (i.e., FB15kET and YAGO43kET) demonstrate that our proposed model outperforms all baselines on all tasks, which verify the effectiveness of ConfE in learning better embeddings on noisy KGs. The source code and data of this work can be obtained from https://github.com/swufenlp/ConfE.


Author(s):  
Xinmeng Li ◽  
Mamoun Alazab ◽  
Qian Li ◽  
Keping Yu ◽  
Quanjun Yin

AbstractKnowledge graph question answering is an important technology in intelligent human–robot interaction, which aims at automatically giving answer to human natural language question with the given knowledge graph. For the multi-relation question with higher variety and complexity, the tokens of the question have different priority for the triples selection in the reasoning steps. Most existing models take the question as a whole and ignore the priority information in it. To solve this problem, we propose question-aware memory network for multi-hop question answering, named QA2MN, to update the attention on question timely in the reasoning process. In addition, we incorporate graph context information into knowledge graph embedding model to increase the ability to represent entities and relations. We use it to initialize the QA2MN model and fine-tune it in the training process. We evaluate QA2MN on PathQuestion and WorldCup2014, two representative datasets for complex multi-hop question answering. The result demonstrates that QA2MN achieves state-of-the-art Hits@1 accuracy on the two datasets, which validates the effectiveness of our model.


Sign in / Sign up

Export Citation Format

Share Document