Dual-Channel Heterogeneous Graph Network for Author Name Disambiguation

Xin Zheng; Pengyu Zhang; Yanjie Cui; Rong Du; Yong Zhang

doi:10.3390/info12090383

Dual-Channel Heterogeneous Graph Network for Author Name Disambiguation

Information ◽

10.3390/info12090383 ◽

2021 ◽

Vol 12 (9) ◽

pp. 383

Author(s):

Xin Zheng ◽

Pengyu Zhang ◽

Yanjie Cui ◽

Rong Du ◽

Yong Zhang

Keyword(s):

Semantic Information ◽

High Accuracy ◽

Social Analysis ◽

Accurate Data ◽

Significant Issue ◽

Name Disambiguation ◽

Clustering Method ◽

Structure Information ◽

Dual Channel ◽

Author Name Disambiguation

Name disambiguation has long been a significant issue in many fields, such as literature management and social analysis. In recent years, methods based on graph networks have performed well in name disambiguation, but these works have rarely used heterogeneous graphs to capture relationships between nodes. Heterogeneous graphs can extract more comprehensive relationship information so that more accurate node embedding can be learned. Therefore, a Dual-Channel Heterogeneous Graph Network is proposed to solve the name disambiguation problem. We use the heterogeneous graph network to capture various node information to ensure that our method can learn more accurate data structure information. In addition, we use fastText to extract the semantic information of the data. Then, a clustering method based on DBSCAN is used to classify academic papers by different authors into different clusters. In many experiments based on real datasets, our method achieved high accuracy, which proves its effectiveness.

Download Full-text

Generating automatically labeled data for author name disambiguation: an iterative clustering method

Scientometrics ◽

10.1007/s11192-018-2968-3 ◽

2018 ◽

Vol 118 (1) ◽

pp. 253-280 ◽

Cited By ~ 5

Author(s):

Jinseok Kim ◽

Jinmo Kim ◽

Jason Owen-Smith

Keyword(s):

Name Disambiguation ◽

Clustering Method ◽

Author Name Disambiguation

Download Full-text

Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning

Journal of the Association for Information Science and Technology ◽

10.1002/asi.24459 ◽

2021 ◽

Author(s):

Jinseok Kim ◽

Jenna Kim ◽

Jason Owen‐Smith

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Name Disambiguation ◽

Author Name Disambiguation

Download Full-text

Multilayer heuristics based clustering framework (MHCF) for author name disambiguation

Scientometrics ◽

10.1007/s11192-021-04087-7 ◽

2021 ◽

Author(s):

Humaira Waqas ◽

Muhammad Abdul Qadir

Keyword(s):

Name Disambiguation ◽

Author Name Disambiguation

Download Full-text

AutoSense Model for Word Sense Induction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016212 ◽

2019 ◽

Vol 33 ◽

pp. 6212-6219 ◽

Cited By ~ 1

Author(s):

Reinald Kim Amplayo ◽

Seung-won Hwang ◽

Min Song

Keyword(s):

Latent Variable ◽

Word Sense ◽

Name Disambiguation ◽

Variable Model ◽

Fine Grained ◽

Word Sense Induction ◽

Author Name Disambiguation ◽

Competing Models ◽

Word Senses ◽

Better Than

Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing fine-grained word senses. Results show great improvements over the stateof-the-art models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://github.com/rktamplayo/AutoSense.

Download Full-text

Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation

Journal of Information Science ◽

10.1177/01655515211018171 ◽

2021 ◽

pp. 016555152110181

Author(s):

Jinseok Kim ◽

Jenna Kim ◽

Jinmo Kim

Keyword(s):

Machine Learning ◽

Real World ◽

Digital Libraries ◽

Chinese Characters ◽

Name Disambiguation ◽

Authority Control ◽

Author Name Disambiguation ◽

Bibliographic Data ◽

Chinese Author

Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.

Download Full-text

LUCID: Author name disambiguation using graph Structural Clustering

2017 Intelligent Systems Conference (IntelliSys) ◽

10.1109/intellisys.2017.8324326 ◽

2017 ◽

Author(s):

Ijaz Hussain ◽

Sohail Asghar

Keyword(s):

Name Disambiguation ◽

Author Name Disambiguation ◽

Structural Clustering

Download Full-text

Correction to: Evaluating author name disambiguation for digital libraries: a case of DBLP

Scientometrics ◽

10.1007/s11192-018-2960-y ◽

2018 ◽

Vol 118 (1) ◽

pp. 383-383

Author(s):

Jinseok Kim

Keyword(s):

Digital Libraries ◽

Name Disambiguation ◽

Author Name Disambiguation

Download Full-text

Author name disambiguation for collaboration network analysis and visualization

Proceedings of the American Society for Information Science and Technology ◽

10.1002/meet.2009.1450460218 ◽

2009 ◽

Vol 46 (1) ◽

pp. 1-20 ◽

Cited By ~ 14

Author(s):

Andreas Strotmann ◽

Dangzhi Zhao ◽

Tania Bubela

Keyword(s):

Network Analysis ◽

Collaboration Network ◽

Name Disambiguation ◽

Author Name Disambiguation

Download Full-text

Off-the-shelf Semantic Author Name Disambiguation for Bibliographic Data Bases

Digital Libraries for Open Knowledge - Lecture Notes in Computer Science ◽

10.1007/978-3-030-30760-8_42 ◽

2019 ◽

pp. 397-400

Author(s):

Mark-Christoph Müller ◽

Adam Bannister ◽

Florian Reitz

Keyword(s):

Name Disambiguation ◽

Data Bases ◽

Author Name Disambiguation ◽

Bibliographic Data

Download Full-text

Network Embedding via a Bi-Mode and Deep Neural Network Model

10.20944/preprints201712.0156.v1 ◽

2017 ◽

Author(s):

Yang Fang ◽

Xiang Zhao ◽

Zhen Tan

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Semantic Information ◽

Dimensional Space ◽

Relation Extraction ◽

Network Embedding ◽

Structure Information ◽

Second Mode ◽

Real World Datasets ◽

Low Dimensional

Network Embedding (NE) is an important method to learn the representations of network via a low-dimensional space. Conventional NE models focus on capturing the structure information and semantic information of vertices while neglecting such information for edges. In this work, we propose a novel NE model named BimoNet to capture both the structure and semantic information of edges. BimoNet is composed of two parts, i.e., the bi-mode embedding part and the deep neural network part. For bi-mode embedding part, the first mode named add-mode is used to express the entity-shared features of edges and the second mode named subtract-mode is employed to represent the entity-specific features of edges. These features actually reflect the semantic information. For deep neural network part, we firstly regard the edges in a network as nodes, and the vertices as links, which will not change the overall structure of the whole network. Then we take the nodes' adjacent matrix as the input of the deep neural network as it can obtain similar representations for nodes with similar structure. Afterwards, by jointly optimizing the objective function of these two parts, BimoNet could preserve both the semantic and structure information of edges. In experiments, we evaluate BimoNet on three real-world datasets and task of relation extraction, and BimoNet is demonstrated to outperform state-of-the-art baseline models consistently and significantly.

Download Full-text