Modeling Complementarity in Behavior Data with Multi-Type Itemset Embedding

2021 ◽  
Vol 12 (4) ◽  
pp. 1-25
Author(s):  
Daheng Wang ◽  
Qingkai Zeng ◽  
Nitesh V. Chawla ◽  
Meng Jiang

People are looking for complementary contexts, such as team members of complementary skills for project team building and/or reading materials of complementary knowledge for effective student learning, to make their behaviors more likely to be successful. Complementarity has been revealed by behavioral sciences as one of the most important factors in decision making. Existing computational models that learn low-dimensional context representations from behavior data have poor scalability and recent network embedding methods only focus on preserving the similarity between the contexts. In this work, we formulate a behavior entry as a set of context items and propose a novel representation learning method, Multi-type Itemset Embedding , to learn the context representations preserving the itemset structures. We propose a measurement of complementarity between context items in the embedding space. Experiments demonstrate both effectiveness and efficiency of the proposed method over the state-of-the-art methods on behavior prediction and context recommendation. We discover that the complementary contexts and similar contexts are significantly different in human behaviors.

2020 ◽  
Vol 34 (03) ◽  
pp. 2950-2958
Author(s):  
Guanglin Niu ◽  
Yongfei Zhang ◽  
Bo Li ◽  
Peng Cui ◽  
Si Liu ◽  
...  

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Moreover, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.


2020 ◽  
Vol 11 (1) ◽  
pp. 162
Author(s):  
Masoud Reyhani Hamedani ◽  
Sang-Wook Kim

One of the important tasks in a graph is to compute the similarity between two nodes; link-based similarity measures (in short, similarity measures) are well-known and conventional techniques for this task that exploit the relations between nodes (i.e., links) in the graph. Graph embedding methods (in short, embedding methods) convert nodes in a graph into vectors in a low-dimensional space by preserving social relations among nodes in the original graph. Instead of applying a similarity measure to the graph to compute the similarity between nodes a and b, we can consider the proximity between corresponding vectors of a and b obtained by an embedding method as the similarity between a and b. Although embedding methods have been analyzed in a wide range of machine learning tasks such as link prediction and node classification, they are not investigated in terms of similarity computation of nodes. In this paper, we investigate both effectiveness and efficiency of embedding methods in the task of similarity computation of nodes by comparing them with those of similarity measures. To the best of our knowledge, this is the first work that examines the application of embedding methods in this special task. Based on the results of our extensive experiments with five well-known and publicly available datasets, we found the following observations for embedding methods: (1) with all datasets, they show less effectiveness than similarity measures except for one dataset, (2) they underperform similarity measures with all datasets in terms of efficiency except for one dataset, (3) they have more parameters than similarity measures, thereby leading to a time-consuming parameter tuning process, (4) increasing the number of dimensions does not necessarily improve their effectiveness in computing the similarity of nodes.


Author(s):  
Yu Li ◽  
Ying Wang ◽  
Tingting Zhang ◽  
Jiawei Zhang ◽  
Yi Chang

Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.


2021 ◽  
Vol 4 ◽  
Author(s):  
Linmei Hu ◽  
Mengmei Zhang ◽  
Shaohua Li ◽  
Jinghan Shi ◽  
Chuan Shi ◽  
...  

Knowledge Graphs (KGs) such as Freebase and YAGO have been widely adopted in a variety of NLP tasks. Representation learning of Knowledge Graphs (KGs) aims to map entities and relationships into a continuous low-dimensional vector space. Conventional KG embedding methods (such as TransE and ConvE) utilize only KG triplets and thus suffer from structure sparsity. Some recent works address this issue by incorporating auxiliary texts of entities, typically entity descriptions. However, these methods usually focus only on local consecutive word sequences, but seldom explicitly use global word co-occurrence information in a corpus. In this paper, we propose to model the whole auxiliary text corpus with a graph and present an end-to-end text-graph enhanced KG embedding model, named Teger. Specifically, we model the auxiliary texts with a heterogeneous entity-word graph (called text-graph), which entails both local and global semantic relationships among entities and words. We then apply graph convolutional networks to learn informative entity embeddings that aggregate high-order neighborhood information. These embeddings are further integrated with the KG triplet embeddings via a gating mechanism, thus enriching the KG representations and alleviating the inherent structure sparsity. Experiments on benchmark datasets show that our method significantly outperforms several state-of-the-art methods.


Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2111
Author(s):  
Bo-Wei Zhao ◽  
Zhu-Hong You ◽  
Lun Hu ◽  
Zhen-Hao Guo ◽  
Lei Wang ◽  
...  

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.


Author(s):  
Nicolo Botteghi ◽  
Ruben Obbink ◽  
Daan Geijs ◽  
Mannes Poel ◽  
Beril Sirmacek ◽  
...  

2021 ◽  
pp. 105065192110214
Author(s):  
Michelle McMullin ◽  
Bradley Dilger

Academic work increasingly involves creating digital tools with interdisciplinary teams distributed across institutions and roles. The negative impacts of distributed work are described at length in technical communication scholarship, but such impacts have not yet been realized in collaborative practices. By integrating attention to their core ethical principles, best practices, and work patterns, the authors are developing an ethical, sustainable approach to team building that they call constructive distributed work. This article describes their integrated approach, documents the best practices that guide their research team, and models the three-dimensional thinking that helps them develop sustainable digital tools and ensure the consistent professional development of all team members.


2020 ◽  
Vol 4 (1) ◽  
Author(s):  
Amin Nasir ◽  
Salamatun Nafi’ah

This research is a research and development on Arabic extensive reading materials using storybooks based on bilingual parallel text. The purposes of this research are to find out the steps in developing Arabic extensive reading materials using storybook based on bilingual parallel text for MA NU Miftahul Falah students and to find out the feasibility of the storybooks and the students responses. This research is a type of research and development. This study uses Richey and Kleins’ model but with some modifications, the stages of this research are: Planning, which is the activity of making a product plan that will be made for a specific purpose. This stage begins with a needs analysis carried out through research and literature study. Production, which is the activity of making products based on the designs that have been made. Evaluation, namely the activity of testing and assessing how high the product has met predetermined specifications. The results of the study, namely the average validity of the storybook that has been developed with all evaluation aspects is 3.49 so it is classified as valid. The evaluation results also show that the product is suitable for use, and the average value of student responses is 84.2 and is in the very good category. Further research needs to be done to determine the effectiveness and efficiency of the product with a wider sample


2017 ◽  
Author(s):  
Matthias Morzfeld ◽  
Jesse Adams ◽  
Spencer Lunderman ◽  
Rafael Orozco

Abstract. Many applications in science require that computational models and data be combined. In a Bayesian framework, this is usually done by defining likelihoods based on the mismatch of model outputs and data. However, matching model outputs and data in this way can be unnecessary or impossible. For example, using large amounts of steady state data is unnecessary because these data are redundant, it is numerically difficult to assimilate data in chaotic systems, and it is often impossible to assimilate data of a complex system into a low-dimensional model. These issues can be addressed by selecting features of the data, and defining likelihoods based on the features, rather than by the usual mismatch of model output and data. Our goal is to contribute to a fundamental understanding of such a feature-based approach that allows us to assimilate selected aspects of data into models. Specifically, we explain how the feature-based approach can be interpreted as a method for reducing an effective dimension, and derive new noise models, based on perturbed observations, that lead to computationally efficient solutions. Numerical implementations of our ideas are illustrated in four examples.


2020 ◽  
Author(s):  
Jing Qian ◽  
Gangmin Li ◽  
Katie Atkinson ◽  
Yong Yue

Knowledge representation learning (KRL) aims at encoding components of a knowledge graph (KG) into a low-dimensional continuous space, which has brought considerable successes in applying deep learning to graph embedding. Most famous KGs contain only positive instances for space efficiency. Typical KRL techniques, especially translational distance-based models, are trained through discriminating positive and negative samples. Thus, negative sampling is unquestionably a non-trivial step in KG embedding. The quality of generated negative samples can directly influence the performance of final knowledge representations in downstream tasks, such as link prediction and triple classification. This review summarizes current negative sampling methods in KRL and we categorize them into three sorts, fixed distribution-based, generative adversarial net (GAN)-based and cluster sampling. Based on this categorization we discuss the most prevalent existing approaches and their characteristics.


Sign in / Sign up

Export Citation Format

Share Document