Ethnicity‐based name partitioning for author name disambiguation using supervised machine learning

Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation

Journal of Information Science ◽

10.1177/01655515211018171 ◽

2021 ◽

pp. 016555152110181

Author(s):

Jinseok Kim ◽

Jenna Kim ◽

Jinmo Kim

Keyword(s):

Machine Learning ◽

Real World ◽

Digital Libraries ◽

Chinese Characters ◽

Name Disambiguation ◽

Authority Control ◽

Author Name Disambiguation ◽

Bibliographic Data ◽

Chinese Author

Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.

Download Full-text

AUTHOR NAME DISAMBIGUATION IN ACADEMIC PUBLICATIONS USING METHODS OF MACHINE LEARNING

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2015.09.pp.041-048 ◽

2015 ◽

pp. 41-48

Author(s):

V. A. Zelepukhina

Keyword(s):

Machine Learning ◽

Name Disambiguation ◽

Author Name Disambiguation ◽

Academic Publications

Download Full-text

Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning

IEEE Access ◽

10.1109/access.2020.3031112 ◽

2020 ◽

Vol 8 ◽

pp. 188378-188389

Author(s):

Jinseok Kim ◽

Jason Owen-Smith

Keyword(s):

Machine Learning ◽

Transfer Learning ◽

Name Disambiguation ◽

Model Reuse ◽

Author Name Disambiguation

Download Full-text

The impact of imbalanced training data on machine learning for author name disambiguation

Scientometrics ◽

10.1007/s11192-018-2865-9 ◽

2018 ◽

Vol 117 (1) ◽

pp. 511-526 ◽

Cited By ~ 8

Author(s):

Jinseok Kim ◽

Jenna Kim

Keyword(s):

Machine Learning ◽

Training Data ◽

Name Disambiguation ◽

Author Name Disambiguation ◽

Imbalanced Training Data ◽

The Impact

Download Full-text

Applying Data Augmentation for Disambiguating Author Names

10.5753/sbbd.2021.17870 ◽

2021 ◽

Author(s):

Luciano V. B. Espiridião ◽

Laura L. Dias ◽

Anderson A. Ferreira

Keyword(s):

Machine Learning ◽

Digital Library ◽

Information Quality ◽

Data Augmentation ◽

Experimental Results ◽

Machine Learning Techniques ◽

Name Disambiguation ◽

Author Name Disambiguation ◽

Learning Techniques ◽

The Many

Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.

Download Full-text

Exploring the Use of Machine Learning to Automate the Qualitative Coding of Church-related Tweets

Fieldwork in Religion ◽

10.1558/firn.40610 ◽

2020 ◽

Vol 14 (2) ◽

pp. 140-159

Author(s):

Anthony-Paul Cooper ◽

Emmanuel Awuni Kolog ◽

Erkki Sutinen

Keyword(s):

Machine Learning ◽

Online Community ◽

High Volume ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Social Media Data ◽

Twitter Data ◽

Resource Intensity ◽

Media Data ◽

Better Than

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.

Download Full-text

Application of Supervised Machine Learning Algorithms for Lithofacies Classification.

10.2523/19349-ms ◽

2019 ◽

Author(s):

Subhadeep Sarkar ◽

Chandan Majumdar

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Lithofacies Classification

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

A Deep Analysis and Efficient Implementation of Supervised Machine Learning Algorithms for Enhancing The Classification Ability of System

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.10941101 ◽

2019 ◽

Vol 7 (3) ◽

pp. 1094-1101

Author(s):

Sandeep Kumar Verma ◽

Turendar Sahu ◽

Manjit Jaiswal

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Efficient Implementation ◽

Machine Learning Algorithms ◽

Supervised Machine Learning

Download Full-text

A Reckoning Analysis and Assessment of Different Supervised Machine Learning Algorithm for Breast Cancer Prediction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.8388 ◽

2019 ◽

Vol 7 (3) ◽

pp. 83-88

Author(s):

Pragati Prakash ◽

Nidhi Ekka ◽

Manjit Jaiswal

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Cancer Prediction

Download Full-text