Identification of duplicate and near‐duplicate full‐text records in database search‐outputs using hierarchic cluster analysis

John W. Kirriemuir; Peter Willett

doi:10.1108/eb047198

A conversational program for hierarchic and non-hierarchic cluster analysis

Medical Informatics ◽

10.3109/14639238009014008 ◽

1980 ◽

Vol 5 (2) ◽

pp. 141-154

Author(s):

R. Jovine ◽

F. Ghezzo ◽

G. Spagnoli

Keyword(s):

Cluster Analysis ◽

Hierarchic Cluster Analysis

Download Full-text

Debates upon full-text database search in the United States.

Journal of Information Processing and Management ◽

10.1241/johokanri.41.93 ◽

1998 ◽

Vol 41 (2) ◽

pp. 93-105 ◽

Cited By ~ 2

Author(s):

Keisuke ARAKI

Keyword(s):

United States ◽

Full Text ◽

The United States ◽

Database Search ◽

Text Database

Download Full-text

Statistical Measures Alone Cannot Determine Which Database (BNI, CINAHL, MEDLINE, or EMBASE) Is the Most Useful for Searching Undergraduate Nursing Topics

Evidence Based Library and Information Practice ◽

10.18438/b8xs6b ◽

2011 ◽

Vol 6 (1) ◽

pp. 71 ◽

Cited By ~ 1

Author(s):

Giovanna Badia

Keyword(s):

Full Text ◽

Database Search ◽

User Preferences ◽

Average Score ◽

Search Results ◽

Undergraduate Nursing ◽

Statistical Measures ◽

Database Evaluation ◽

Friedman's Test ◽

Health Studies

A Review of: Stokes, P., Foster, A., & Urquhart, C. (2009). Beyond relevance and recall: Testing new user-centred measures of database performance. Health Information and Libraries Journal, 26(3), 220-231. Objective – The research project sought to determine which of four databases was the most useful for searching undergraduate nursing topics. Design – Comparative database evaluation. Setting – Nursing and midwifery students at Homerton School of Health Studies (now part of Anglia Ruskin University), Cambridge, United Kingdom, in 2005-2006. Subjects – The subjects were four databases: British Nursing Index (BNI), CINAHL, MEDLINE, and EMBASE). Methods – This was a comparative study using title searches to compare BNI (British Nursing Index), CINAHL, MEDLINE and EMBASE. According to the authors, this is the first study to compare BNI with other databases. BNI is a database produced by British libraries that indexes the nursing and midwifery literature. It covers over 240 British journals, and includes references to articles from health sciences journals that are relevant to nurses and midwives (British Nursing Index, n.d.). The researchers performed keyword searches in the title field of the four databases for the dissertation topics of nine nursing and midwifery students enrolled in undergraduate dissertation modules. The list of titles of journals articles on their topics were given to the students and they were asked to judge the relevancy of the citations. The title searches were evaluated in each of the databases using the following criteria: • precision (the number of relevant results obtained in the database for a search topic, divided by the total number of results obtained in the database search); • recall (the number of relevant results obtained in the database for a search topic, divided by the total number of relevant results obtained on that topic from all four database searches); • novelty (the number of relevant results that were unique in the database search, which was calculated as a percentage of the total number of relevant results found in the database); • originality (the number of unique relevant results obtained in the database for a search topic, which was calculated as a percentage of the total number of unique results found in all four database searches); • availability (the number of relevant full text articles obtained from the database search results, which was calculated as a percentage of the total number of relevant results found in the database); • retrievability (the number of relevant full text articles obtained from the database search results, which was calculated as a percentage of the total number of relevant full text articles found from all four database searches); • effectiveness (the probable odds that a database will obtain relevant search results); • efficiency (the probable odds that a database will obtain both unique and relevant search results); and • accessibility (the probable odds that the full text of the relevant references obtained from the database search are available electronically or in print via the user’s library). Students decided whether the search results were relevant to their topic by using a “yes/no” scale. Only record titles were used to make relevancy judgments. Main Results – Friedman’s Test and odds ratios were used to compare the performance of BNI, CINAHL, MEDLINE, and EMBASE when searching for information about nursing topics. These two statistical measures demonstrated the following: • BNI had the best average score for the precision, availability, effectiveness, and accessibility of search results; • CINAHL scored the highest for the novelty, retrievability, and efficiency of results, and ranked second place for all the other criteria; • MEDLINE excelled in the areas of recall and originality, and ranked second place for novelty and retrievability; and • EMBASE did not obtain the highest, or second highest score, for any of the criteria. Conclusion – According to the authors, these results suggest that none of the databases studied can be considered the most useful for searching undergraduate nursing topics. CINAHL and MEDLINE emerge as consistently good performers, but both databases are needed to find relevant material on a topic. Friedman’s Test clearly differentiated between the databases for the accessibility of search results. Odds ratio testing may assist librarians to make decisions about database purchases. BNI scored the highest for availability of results and CINAHL ranked the highest for retrievability. Statistical measures need to be supplemented with qualitative data about user preferences in order to determine which database is the most useful to our users.

Download Full-text

Coword and Cluster Analysis for the Romance of the Three Kingdoms

Wireless Communications and Mobile Computing ◽

10.1155/2021/5553635 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Chao Fan ◽

Yu Li

Keyword(s):

Cluster Analysis ◽

Social Network ◽

Full Text ◽

Historical Novel ◽

The Novel ◽

Clustering Method ◽

Research Framework ◽

Historical Figures ◽

Edge Betweenness ◽

And Cluster Analysis

The Romance of the Three Kingdoms (RTK) is a classical Chinese historical novel by Luo Guanzhong. This paper establishes a research framework of analyzing the novel by utilizing coword and cluster analysis technology. At the beginning, we segment the full text of the novel, extracting the names of historical figures in the RTK novel. Based on the coword analysis, a social network of historical figures is constructed. We calculate several network features and enforce the cluster analysis. In addition, a modified clustering method using edge betweenness is proposed to improve the effect of clustering. Finally, both quantified and visualized results are displayed to confirm our approach.

Download Full-text

A Comparative Studies of Automatic Query Formulation in Full-Text Database Search of Chinese Digital Humanities

Diversity, Divergence, Dialogue - Lecture Notes in Computer Science ◽

10.1007/978-3-030-71292-1_35 ◽

2021 ◽

pp. 457-468

Author(s):

Chengxi Yan ◽

Tzu-Yi Ho ◽

Jun Wang

Keyword(s):

Full Text ◽

Comparative Studies ◽

Digital Humanities ◽

Database Search ◽

Query Formulation ◽

Text Database

Download Full-text

Applications of High-Intensity Focused Ultrasound in the Treatment of Different Pathologies

Journal of Diagnostic Medical Sonography ◽

10.1177/8756479320972086 ◽

2020 ◽

pp. 875647932097208

Author(s):

Iqra Manzoor ◽

Raham Bacha ◽

Syed Amir Gilani

Keyword(s):

Full Text ◽

High Intensity ◽

Focused Ultrasound ◽

Technological Development ◽

Meta Analysis ◽

Uterine Fibroids ◽

Database Search ◽

High Intensity Focused Ultrasound ◽

Large Sample Size ◽

Quantitative Synthesis

Objective: The purpose of this literature search was to review the benefits of high-intensity focused ultrasound (HIFU) and its application for different pathologies. Methods: This review summarizes the implementation of HIFU for different pathologic conditions. An National Center for Biotechnology Information, PubMed, MEDLINE, Medscape, and Google Scholar database search (1992–2016) was done with the following keywords: high-intensity focused ultrasound; uses of HIFU; and applications of HIFU in the liver, bones, uterine fibroids, prostate, breast, thyroid, pancreas, kidneys, brain, urinary bladder, and so on. Tables and graphs were created for all the variables included in the study, and descriptive statistics were applied. Results: In total, 110 records were identified, through database search. In addition, 20 articles were identified through other sources. Screening of the articles was performed, and 20 were removed due to duplication; further screening was performed for 110 articles, and 30 records were further excluded. Full-text articles were assessed for eligibility and 30 were retained. Full-text articles were excluded (N = 36) on the basis that research was performed on animals, and this review article was performed solely for human application. There were 42 qualitative syntheses that researches added to the review. In addition, 42 quantitative synthesis (meta-analysis) were added to the review. Conclusion: The conclusion of this narrative review indicates that HIFU is noninvasive, nonharmful, and effective in treating diseases and tumors of the brain, breast, bone, hepatic, renal, pancreas, and prostate; uterine fibroids; and many other solid tumors. Recent technological development suggests that HIFU is likely to play a significant role in future surgical practices. Further research works should be conducted on a large sample size to obtain more accurate results in the application of HIFU.

Download Full-text

Cluster analysis for large data sets: applications to individual aerosol particles from the mid-pacific

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100132078 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1488-1489

Author(s):

Thomas W. Shattuck ◽

James R. Anderson ◽

Neil W. Tindale ◽

Peter R. Buseck

Keyword(s):

Cluster Analysis ◽

Chemical Reactivity ◽

Large Data ◽

Large Data Sets ◽

Particle Analysis ◽

Data Sets ◽

Halogen Chemistry ◽

Complete Study ◽

Components Analysis ◽

Automated Scanning

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.

Download Full-text

Measuring “Language Access Profiles” in Deaf and Hard-of-Hearing Children With the DHH Language Exposure Assessment Tool

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00439 ◽

2020 ◽

pp. 1-25

Author(s):

Matthew L. Hall ◽

Stephanie De Anda

Keyword(s):

Cluster Analysis ◽

Exposure Assessment ◽

Hard Of Hearing ◽

Assessment Tool ◽

Communication Mode ◽

Developmental Potential ◽

Language Access ◽

Retest Reliability ◽

Language Exposure ◽

Test Retest Reliability

Purpose The purposes of this study were (a) to introduce “language access profiles” as a viable alternative construct to “communication mode” for describing experience with language input during early childhood for deaf and hard-of-hearing (DHH) children; (b) to describe the development of a new tool for measuring DHH children's language access profiles during infancy and toddlerhood; and (c) to evaluate the novelty, reliability, and validity of this tool. Method We adapted an existing retrospective parent report measure of early language experience (the Language Exposure Assessment Tool) to make it suitable for use with DHH populations. We administered the adapted instrument (DHH Language Exposure Assessment Tool [D-LEAT]) to the caregivers of 105 DHH children aged 12 years and younger. To measure convergent validity, we also administered another novel instrument: the Language Access Profile Tool. To measure test–retest reliability, half of the participants were interviewed again after 1 month. We identified groups of children with similar language access profiles by using hierarchical cluster analysis. Results The D-LEAT revealed DHH children's diverse experiences with access to language during infancy and toddlerhood. Cluster analysis groupings were markedly different from those derived from more traditional grouping rules (e.g., communication modes). Test–retest reliability was good, especially for the same-interviewer condition. Content, convergent, and face validity were strong. Conclusions To optimize DHH children's developmental potential, stakeholders who work at the individual and population levels would benefit from replacing communication mode with language access profiles. The D-LEAT is the first tool that aims to measure this novel construct. Despite limitations that future work aims to address, the present results demonstrate that the D-LEAT represents progress over the status quo.

Download Full-text