Identification of duplicate and near‐duplicate full‐text records in database search‐outputs using hierarchic cluster analysis

1995 ◽  
Vol 29 (3) ◽  
pp. 241-256 ◽  
Author(s):  
John W. Kirriemuir ◽  
Peter Willett
1980 ◽  
Vol 5 (2) ◽  
pp. 141-154
Author(s):  
R. Jovine ◽  
F. Ghezzo ◽  
G. Spagnoli

2011 ◽  
Vol 6 (1) ◽  
pp. 71 ◽  
Author(s):  
Giovanna Badia

A Review of: Stokes, P., Foster, A., & Urquhart, C. (2009). Beyond relevance and recall: Testing new user-centred measures of database performance. Health Information and Libraries Journal, 26(3), 220-231. Objective – The research project sought to determine which of four databases was the most useful for searching undergraduate nursing topics. Design – Comparative database evaluation. Setting – Nursing and midwifery students at Homerton School of Health Studies (now part of Anglia Ruskin University), Cambridge, United Kingdom, in 2005-2006. Subjects – The subjects were four databases: British Nursing Index (BNI), CINAHL, MEDLINE, and EMBASE). Methods – This was a comparative study using title searches to compare BNI (British Nursing Index), CINAHL, MEDLINE and EMBASE. According to the authors, this is the first study to compare BNI with other databases. BNI is a database produced by British libraries that indexes the nursing and midwifery literature. It covers over 240 British journals, and includes references to articles from health sciences journals that are relevant to nurses and midwives (British Nursing Index, n.d.). The researchers performed keyword searches in the title field of the four databases for the dissertation topics of nine nursing and midwifery students enrolled in undergraduate dissertation modules. The list of titles of journals articles on their topics were given to the students and they were asked to judge the relevancy of the citations. The title searches were evaluated in each of the databases using the following criteria: • precision (the number of relevant results obtained in the database for a search topic, divided by the total number of results obtained in the database search); • recall (the number of relevant results obtained in the database for a search topic, divided by the total number of relevant results obtained on that topic from all four database searches); • novelty (the number of relevant results that were unique in the database search, which was calculated as a percentage of the total number of relevant results found in the database); • originality (the number of unique relevant results obtained in the database for a search topic, which was calculated as a percentage of the total number of unique results found in all four database searches); • availability (the number of relevant full text articles obtained from the database search results, which was calculated as a percentage of the total number of relevant results found in the database); • retrievability (the number of relevant full text articles obtained from the database search results, which was calculated as a percentage of the total number of relevant full text articles found from all four database searches); • effectiveness (the probable odds that a database will obtain relevant search results); • efficiency (the probable odds that a database will obtain both unique and relevant search results); and • accessibility (the probable odds that the full text of the relevant references obtained from the database search are available electronically or in print via the user’s library). Students decided whether the search results were relevant to their topic by using a “yes/no” scale. Only record titles were used to make relevancy judgments. Main Results – Friedman’s Test and odds ratios were used to compare the performance of BNI, CINAHL, MEDLINE, and EMBASE when searching for information about nursing topics. These two statistical measures demonstrated the following: • BNI had the best average score for the precision, availability, effectiveness, and accessibility of search results; • CINAHL scored the highest for the novelty, retrievability, and efficiency of results, and ranked second place for all the other criteria; • MEDLINE excelled in the areas of recall and originality, and ranked second place for novelty and retrievability; and • EMBASE did not obtain the highest, or second highest score, for any of the criteria. Conclusion – According to the authors, these results suggest that none of the databases studied can be considered the most useful for searching undergraduate nursing topics. CINAHL and MEDLINE emerge as consistently good performers, but both databases are needed to find relevant material on a topic. Friedman’s Test clearly differentiated between the databases for the accessibility of search results. Odds ratio testing may assist librarians to make decisions about database purchases. BNI scored the highest for availability of results and CINAHL ranked the highest for retrievability. Statistical measures need to be supplemented with qualitative data about user preferences in order to determine which database is the most useful to our users.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Chao Fan ◽  
Yu Li

The Romance of the Three Kingdoms (RTK) is a classical Chinese historical novel by Luo Guanzhong. This paper establishes a research framework of analyzing the novel by utilizing coword and cluster analysis technology. At the beginning, we segment the full text of the novel, extracting the names of historical figures in the RTK novel. Based on the coword analysis, a social network of historical figures is constructed. We calculate several network features and enforce the cluster analysis. In addition, a modified clustering method using edge betweenness is proposed to improve the effect of clustering. Finally, both quantified and visualized results are displayed to confirm our approach.


2020 ◽  
pp. 875647932097208
Author(s):  
Iqra Manzoor ◽  
Raham Bacha ◽  
Syed Amir Gilani

Objective: The purpose of this literature search was to review the benefits of high-intensity focused ultrasound (HIFU) and its application for different pathologies. Methods: This review summarizes the implementation of HIFU for different pathologic conditions. An National Center for Biotechnology Information, PubMed, MEDLINE, Medscape, and Google Scholar database search (1992–2016) was done with the following keywords: high-intensity focused ultrasound; uses of HIFU; and applications of HIFU in the liver, bones, uterine fibroids, prostate, breast, thyroid, pancreas, kidneys, brain, urinary bladder, and so on. Tables and graphs were created for all the variables included in the study, and descriptive statistics were applied. Results: In total, 110 records were identified, through database search. In addition, 20 articles were identified through other sources. Screening of the articles was performed, and 20 were removed due to duplication; further screening was performed for 110 articles, and 30 records were further excluded. Full-text articles were assessed for eligibility and 30 were retained. Full-text articles were excluded (N = 36) on the basis that research was performed on animals, and this review article was performed solely for human application. There were 42 qualitative syntheses that researches added to the review. In addition, 42 quantitative synthesis (meta-analysis) were added to the review. Conclusion: The conclusion of this narrative review indicates that HIFU is noninvasive, nonharmful, and effective in treating diseases and tumors of the brain, breast, bone, hepatic, renal, pancreas, and prostate; uterine fibroids; and many other solid tumors. Recent technological development suggests that HIFU is likely to play a significant role in future surgical practices. Further research works should be conducted on a large sample size to obtain more accurate results in the application of HIFU.


Author(s):  
Thomas W. Shattuck ◽  
James R. Anderson ◽  
Neil W. Tindale ◽  
Peter R. Buseck

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.


Author(s):  
Matthew L. Hall ◽  
Stephanie De Anda

Purpose The purposes of this study were (a) to introduce “language access profiles” as a viable alternative construct to “communication mode” for describing experience with language input during early childhood for deaf and hard-of-hearing (DHH) children; (b) to describe the development of a new tool for measuring DHH children's language access profiles during infancy and toddlerhood; and (c) to evaluate the novelty, reliability, and validity of this tool. Method We adapted an existing retrospective parent report measure of early language experience (the Language Exposure Assessment Tool) to make it suitable for use with DHH populations. We administered the adapted instrument (DHH Language Exposure Assessment Tool [D-LEAT]) to the caregivers of 105 DHH children aged 12 years and younger. To measure convergent validity, we also administered another novel instrument: the Language Access Profile Tool. To measure test–retest reliability, half of the participants were interviewed again after 1 month. We identified groups of children with similar language access profiles by using hierarchical cluster analysis. Results The D-LEAT revealed DHH children's diverse experiences with access to language during infancy and toddlerhood. Cluster analysis groupings were markedly different from those derived from more traditional grouping rules (e.g., communication modes). Test–retest reliability was good, especially for the same-interviewer condition. Content, convergent, and face validity were strong. Conclusions To optimize DHH children's developmental potential, stakeholders who work at the individual and population levels would benefit from replacing communication mode with language access profiles. The D-LEAT is the first tool that aims to measure this novel construct. Despite limitations that future work aims to address, the present results demonstrate that the D-LEAT represents progress over the status quo.


Sign in / Sign up

Export Citation Format

Share Document