Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar

Lokman I. Meho; Kiduk Yang

doi:10.1002/asi.20677

Ranking by Relevance and Citation Counts, a Comparative Study: Google Scholar, Microsoft Academic, WoS and Scopus

Future Internet ◽

10.3390/fi11090202 ◽

2019 ◽

Vol 11 (9) ◽

pp. 202 ◽

Cited By ~ 5

Author(s):

Rovira ◽

Codina ◽

Guerrero-Solé ◽

Lopezosa

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Of Science ◽

Growth Potential ◽

Google Scholar ◽

Unexpected Finding ◽

Ranking Algorithms ◽

Microsoft Academic ◽

Citation Counts ◽

The Impact

Search engine optimization (SEO) constitutes the set of methods designed to increase the visibility of, and the number of visits to, a web page by means of its ranking on the search engine results pages. Recently, SEO has also been applied to academic databases and search engines, in a trend that is in constant growth. This new approach, known as academic SEO (ASEO), has generated a field of study with considerable future growth potential due to the impact of open science. The study reported here forms part of this new field of analysis. The ranking of results is a key aspect in any information system since it determines the way in which these results are presented to the user. The aim of this study is to analyze and compare the relevance ranking algorithms employed by various academic platforms to identify the importance of citations received in their algorithms. Specifically, we analyze two search engines and two bibliographic databases: Google Scholar and Microsoft Academic, on the one hand, and Web of Science and Scopus, on the other. A reverse engineering methodology is employed based on the statistical analysis of Spearman’s correlation coefficients. The results indicate that the ranking algorithms used by Google Scholar and Microsoft are the two that are most heavily influenced by citations received. Indeed, citation counts are clearly the main SEO factor in these academic search engines. An unexpected finding is that, at certain points in time, Web of Science (WoS) used citations received as a key ranking factor, despite the fact that WoS support documents claim this factor does not intervene.

Download Full-text

The Benefits and Pitfalls of Google Scholar

Political Science and Politics ◽

10.1017/s104909651800094x ◽

2018 ◽

Vol 51 (4) ◽

pp. 820-824 ◽

Cited By ~ 4

Author(s):

Francesca R. Jensenius ◽

Mala Htun ◽

David J. Samuels ◽

David A. Singer ◽

Adria Lawrence ◽

...

Keyword(s):

Political Process ◽

Data Sources ◽

Google Scholar ◽

Qualitative Assessment ◽

Research Communities ◽

Scholarly Impact ◽

Innovative Work ◽

Citation Counts

ABSTRACTGoogle Scholar (GS) is an important tool that faculty, administrators, and external reviewers use to evaluate the scholarly impact of candidates for jobs, tenure, and promotion. This article highlights both the benefits of GS—including the reliability and consistency of its citation counts and its platform for disseminating scholarship and facilitating networking—and its pitfalls. GS has biases because citation is a social and political process that disadvantages certain groups, including women, younger scholars, scholars in smaller research communities, and scholars opting for risky and innovative work. GS counts also reflect practices of strategic citation that exacerbate existing hierarchies and inequalities. As a result, it is imperative that political scientists incorporate other data sources, especially independent scholarly judgment, when making decisions that are crucial for careers. External reviewers have a unique obligation to offer a reasoned, rigorous, and qualitative assessment of a scholar’s contributions and therefore should not use GS.

Download Full-text

Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison

10.31235/osf.io/hcx27 ◽

2018 ◽

Cited By ~ 1

Author(s):

Alberto Martín-Martín ◽

Enrique Orduna-Malea ◽

Emilio Delgado López-Cózar

Keyword(s):

Web Of Science ◽

Large Fraction ◽

Correlation Coefficients ◽

Google Scholar ◽

Bibliometric Indicators ◽

Social Sciences And Humanities ◽

Initial Hypothesis ◽

Citation Counts ◽

Broad Subject ◽

Highly Cited

This study explores the extent to which bibliometric indicators based on counts of highly-cited documents could be affected by the choice of data source. The initial hypothesis is that databases that rely on journal selection criteria for their document coverage may not necessarily provide an accurate representation of highly-cited documents across all subject areas, while inclusive databases, which give each document the chance to stand on its own merits, might be better suited to identify highly-cited documents. To test this hypothesis, an analysis of 2,515 highly-cited documents published in 2006 that Google Scholar displays in its Classic Papers product is carried out at the level of broad subject categories, checking whether these documents are also covered in Web of Science and Scopus, and whether the citation counts offered by the different sources are similar. The results show that a large fraction of highly-cited documents in the Social Sciences and Humanities (8.6%-28.2%) are invisible to Web of Science and Scopus. In the Natural, Life, and Health Sciences the proportion of missing highly-cited documents in Web of Science and Scopus is much lower. Furthermore, in all areas, Spearman correlation coefficients of citation counts in Google Scholar, as compared to Web of Science and Scopus citation counts, are remarkably strong (.83-.99). The main conclusion is that the data about highly-cited documents available in the inclusive database Google Scholar does indeed reveal significant coverage deficiencies in Web of Science and Scopus in some areas of research. Therefore, using these selective databases to compute bibliometric indicators based on counts of highly-cited documents might produce biased assessments in poorly covered areas.

Download Full-text

A citation analysis of Serbian Dental Journal using Web of Science, Scopus and Google Scholar

Stomatoloski glasnik Srbije ◽

10.2298/sgs1004201j ◽

2010 ◽

Vol 57 (4) ◽

pp. 201-211 ◽

Cited By ~ 3

Author(s):

Jelena Jacimovic ◽

Ruzica Petrovic ◽

Slavoljub Zivkovic

Keyword(s):

Web Of Science ◽

Scientific Information ◽

Google Scholar ◽

Bibliographic Databases ◽

Thomson Scientific ◽

Significant Difference ◽

Comprehensive Picture ◽

Citation Counts ◽

The Impact ◽

Citation Databases

Introduction. For a long time, The Institute for Scientific Information (ISI, now Thomson Scientific, Philadelphia, US) citation databases, available online through the Web of Science (WoS), had an unique position among bibliographic databases. The emergence of new citation databases, such as Scopus and Google Scholar (GS), call in question the dominance of WoS and the accuracy of bibliometric and citation studies exclusively based on WoS data. The aim of this study was to determine whether there were significant differences in the received citation counts for Serbian Dental Journal (SDJ) found in WoS and Scopus databases, or whether GS results differed significantly from those obtained by WoS and Scopus, and whether GS could be an adequate qualitative alternative for commercial databases in the impact assessment of this journal. Material and Methods. The data regarding SDJ citation was collected in September 2010 by searching WoS, Scopus and GS databases. For further analysis, all relevant data of both, cited and citing articles, were imported into Microsoft Access? database. Results. One hundred and fifty-eight cited papers from SDJ and 249 received citations were found in the three analyzed databases. 74% of cited articles were found in GS, 46% in Scopus and 44% in WoS. The greatest number of citations (189) was derived from GS, while only 15% of the citations, were found in all three databases. There was a significant difference in the percentage of unique citations found in the databases. 58% originated from GS, while Scopus and WoS gave 6% and 4% unique citations, respectively. The highest percentage of databases overlap was found between WoS and Scopus (70%), while the overlap between Scopus and GS was 18% only. In case of WoS and GS the overlap was 17%. Most of the SDJ citations came from original scientific articles. Conclusion. WoS, Scopus and GS produce quantitatively and qualitatively different citation counts for SDJ articles. None of the examined databases can provide a comprehensive picture and it is necessary to take into account all three available sources.

Download Full-text

Use Google Scholar, Scopus and Web of Science for Comprehensive Citation Tracking

Evidence Based Library and Information Practice ◽

10.18438/b8cs37 ◽

2007 ◽

Vol 2 (3) ◽

pp. 87 ◽

Cited By ~ 7

Author(s):

Lorie Andrea Kloda

Keyword(s):

Web Of Science ◽

Condensed Matter ◽

Small Sample ◽

Skewed Distribution ◽

Google Scholar ◽

Impact Factors ◽

Systematic Sampling ◽

Condensed Matter Physics ◽

Significant Difference ◽

Citation Counts

Objective – To determine whether three competing citation tracking services result in differing citation counts for a known set of articles, and to assess the extent of any differences. Design – Citation analysis, observational study. Setting – Three citation tracking databases: Google Scholar, Scopus and Web of Science. Subjects – Citations from eleven journals each from the disciplines of oncology and condensed matter physics for the years 1993 and 2003. Methods – The researchers selected eleven journals each from the list of journals from Journal Citation Reports 2004 for the categories “Oncology” and “Condensed Matter Physics” using a systematic sampling technique to ensure journals with varying impact factors were included. All references from these 22 journals were retrieved for the years 1993 and 2003 by searching three databases: Web of Science, INSPEC, and PubMed. Only research articles were included for the purpose of the study. From these, a stratified random sample was created to proportionally represent the content of each journal (oncology 1993: 234 references, 2003: 259 references; condensed matter physics 1993: 358 references, 2003: 364 references). In November of 2005, citations counts were obtained for all articles from Web of Science, Scopus and Google Scholar. Due to the small sample size and skewed distribution of data, non-parametric tests were conducted to determine whether significant differences existed between sets. Main results – For 1993, mean citation counts were highest in Web of Science for both oncology (mean = 45.3, SD = 77.4) and condensed matter physics (mean = 22.5, SD = 32.5). For 2003, mean citation counts were higher in Scopus for oncology (mean = 8.9, SD = 12.0), and in Web of Science for condensed matter physics (mean = 3.0, SD = 4.0). There was not enough data for the set of citations from Scopus for condensed matter physics for 1993 and it was therefore excluded from analysis. A Friedman test to measure for differences between all remaining groups suggested a significant difference existed, and so pairwise post-hoc comparisons were performed. The Wilcoxon Signed Ranked tests demonstrated significant differences “in citation counts between all pairs (p < 0.001) except between Google Scholar and Scopus for CM physics 2003 (p = 0.119).” The study also looked at the number of unique references from each database, as well as the proportion of overlap for the 2003 citations. In the area of oncology, there was found to be 31% overlap between databases, with Google Scholar including the most unique references (13%), followed by Scopus (12%) and Web of Science (7%). For condensed matter physics, the overlap was lower at 21% and the largest number of unique references was found in Web of Science (21%), with Google Scholar next largest (17%) and Scopus the least (9%). Citing references from Google Scholar were found to originate from not only journals, but online archives, academic repositories, government and non-government white papers and reports, commercial organizations, as well as other sources. Conclusion – The study does not confirm the authors’ hypothesis that differing scholarly coverage would result in different citation counts from the three databases. While there were significant differences in mean citation rates between all pairs of databases except for Google Scholar and Scopus in condensed matter physics for 2003, no one database performed better overall. Different databases performed better for different subjects, as well as for different years, especially Scopus, which only includes references starting in 1996. The results of this study suggest that the best citation database will depend on the years being searched as well as the subject area. For a complete picture of citation behaviour, the authors suggest all three be used.

Download Full-text

La cobertura de los índices de citas abiertos se acerca a la de Web of Science y Scopus

Anuario ThinkEPI ◽

10.3145/thinkepi.2021.e15e04 ◽

2021 ◽

Author(s):

Alberto Martín-Martín

Keyword(s):

Information Sources ◽

Web Of Science ◽

Open Data ◽

Digital Transformation ◽

Data Sources ◽

Google Scholar ◽

Easy Access ◽

Citation Data ◽

Microsoft Academic ◽

New Infrastructure

The information sources that are often used to monitor and to obtain a better understanding of the system of scholarly communication (such as Web of Science, Scopus, and Google Scholar) have historically been distributed under restrictive use licenses. However, in a scenario where science and scientific communication are undergoing a process of digital transformation, these models do not facilitate the development of new infrastructure that is better adapted to current and future needs. At the same time, these models hamper reproducibility. In recent years, a variety of open data sources, such as Microsoft Academic, Crossref, and others, have become available, providing easy access to large collections of metadata that were previously only available from closed sources. Citation data are one type of metadata provided by these open data sources. This study documents the significant growth in coverage of open citation data that has taken place between 2019 and 2021, and the events that have led to this point. These collections of open scholarly metadata have kick-started the development of a new ecosystem of scholarly information services. However, their fragility still poses a risk for downstream applications. Academic libraries could become important allies of open scholarly metadata initiatives. Resumen Históricamente, las fuentes de información utilizadas para observar y comprender el funcionamiento del sistema de comunicación científica han sido distribuidas bajo licencias de uso restrictivas (Web of Science, Scopus, Google Scholar). En el contexto actual, caracterizado por un proceso de transformación digital de la ciencia y de la comunicación científica, estos modelos no facilitan el desarrollo de infraestructuras y herramientas de información científica adaptadas a nuevas necesidades, e impiden la realización de análisis reproducibles. Afortunadamente, en los últimos años han aparecido diversas colecciones de metadatos de investigación distribuidas bajo licencias abiertas, como las ofrecidas por Microsoft Academic, Crossref y otros. Un tipo de metadato ofrecido por estas fuentes abiertas que anteriormente solo estaba disponible desde fuentes cerradas son las relaciones de citación entre documentos académicos. Este trabajo muestra el significativo crecimiento que se ha producido entre 2019 y 2021 en la cobertura de citas disponible en fuentes abiertas, así como los pasos que han sido necesarios para llegar hasta este punto. Estas colecciones de metadatos abiertas han estimulado el desarrollo de un nuevo ecosistema de herramientas de información científica, pero su fragilidad representa un riesgo de cara al futuro. Las bibliotecas académicas podrían convertirse en importantes aliadas de estas iniciativas.

Download Full-text

The rivalry between Bernini and Borromini from a scientometric perspective

Scientometrics ◽

10.1007/s11192-020-03514-5 ◽

2020 ◽

Vol 125 (2) ◽

pp. 1643-1663

Author(s):

Martin Wieland ◽

Juan Gorraiz

Keyword(s):

Impact Assessment ◽

Core Collection ◽

Web Of Science ◽

Point Of View ◽

Data Sources ◽

Google Scholar ◽

Twitter Data ◽

Baroque Art ◽

Minimum Number ◽

The University

AbstractFrom a historical point of view, Rome and especially the University of La Sapienza, are closely linked to two geniuses of Baroque art: Bernini and Borromini. In this study, we analyze the rivalry between them from a scientometric perspective. This study also serves as a basis for exploring which data sources may be appropriate for broad impact assessment of individuals and/or celebrities. We pay special attention to encyclopaedias, library catalogues and other databases or types of publications that are not normally used for this purpose. The results show that some sources such as Wikipedia are not exploited according to the possibilities they offer, especially those related to different languages and cultures. Moreover, analyses are often reduced to a minimum number of data sources, which can distort the relevance of the outcome. Our results show that other sources normally not considered for this purpose, like JSTOR, PQDT, Google Scholar, Catalogue Holdings, etc. can provide more relevant or abundant information than the typically used Web of Science Core Collection and Scopus. Finally, we also contrast opportunities and limitation of old and new (YouTube, Twitter) data sources (particularly the aspects quality and accuracy of the search methods). Much room for improvement has been identified in order to use data sources more efficiently and with higher accuracy.

Download Full-text

Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories

10.31235/osf.io/42nkm ◽

2018 ◽

Cited By ~ 3

Author(s):

Alberto Martín-Martín ◽

Enrique Orduna-Malea ◽

Mike Thelwall ◽

Emilio Delgado López-Cózar

Keyword(s):

Core Collection ◽

English Language ◽

Web Of Science ◽

Google Scholar ◽

Citation Data ◽

Subject Categories ◽

Citation Counts ◽

The Many ◽

Conference Papers ◽

Highly Cited

Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2,299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. GS consistently found the largest percentage of citations across all areas (93%-96%), far ahead of Scopus (35%-77%) and WoS (27%-73%). GS found nearly all the WoS (95%) and Scopus (92%) citations. Most citations found only by GS were from non-journal sources (48%-65%), including theses, books, conference papers, and unpublished materials. Many were non-English (19%-38%), and they tended to be much less cited than citing sources that were also in Scopus or WoS. Despite the many unique GS citing sources, Spearman correlations between citation counts in GS and WoS or Scopus are high (0.78-0.99). They are lower in the Humanities, and lower between GS and WoS than between GS and Scopus. The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage.

Download Full-text

Citation indices

Journal of Skin and Sexually Transmitted Diseases ◽

10.25259/jsstd_7_2020 ◽

2020 ◽

Vol 2 ◽

pp. 2-4

Author(s):

Feroze Kaliyadan ◽

Karalikkattil T. Ashique

Keyword(s):

Web Of Science ◽

Ethical Issues ◽

Google Scholar ◽

H Index ◽

Citation Counts ◽

Index Calculation

Impact of research is generally measured through citation counts. For author impact, the most common impact indices considered are the h-index, i10-index, and the g-index, of which the h-index is the most commonly used. There are various resources available for retrieving researcher h-indices. The most common databases used for the same are Scopus, Google Scholar, and “Web of Science.” Ethical issues related to the use of these resources for h-index calculation include – gaming/manipulation and fake citations. An issue which we have noticed cropping up of late, is researchers claiming erroneous h-indices.

Download Full-text

What Are the Drivers of Citations?: Application in Tourism and Hospitality Journals

Applied Sciences ◽

10.3390/app11199288 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9288

Author(s):

Eunhye Park ◽

Woohyuk Kim

Keyword(s):

Topic Modeling ◽

Web Of Science ◽

Sharing Economy ◽

Online Media ◽

Research Topics ◽

Qualitative And Quantitative ◽

Hospitality And Tourism ◽

Citation Counts ◽

Tourism And Hospitality ◽

Topic Structure

In line with the qualitative and quantitative growth of academic papers, it is critical to understand the factors driving citations in scholarly articles. This study discovered the up-to-date academic structure in the tourism and hospitality literature and tested the comprehensive sets of factors driving citation counts using articles published in first-tier hospitality and tourism journals found on the Web of Science. To further test the effects of research topic structure on citation counts, unsupervised topic modeling was conducted with 9910 tourism and hospitality papers published in 12 journals over 10 years. Articles specific to online media and the sharing economy have received numerous citations and that recently published papers with particular research topics (e.g., rural tourism and eco-tourism) were frequently cited. This study makes a major contribution to hospitality and tourism literature by testing the effects of topic structure and topic originality discovered by text mining on citation counts.

Download Full-text