scholarly journals The capability of search tools to retrieve words with specific properties from large text collections

2020 ◽  
Author(s):  
Liezl Ball ◽  
◽  
Theo Bothma ◽  

Introduction. With the increase in the availability of digital text collections for humanities researchers, tools to enable enhanced retrieval are required. If words with very specific properties could be retrieved from a text collection more accurate linguistic and other analyses can be made. There are a range of properties and metadata that could be specified for retrieval, from morphological data up to bibliographic data. Furthermore, the bibliographic data should not only be on item level but extended to the text-level. For example, in an anthology each section could be encoded with the author of that section. Such extended metadata will enable fine-grained retrieval. Method. In this study, current tools were evaluated to determine to what extent they allow users to retrieve words with specific properties from a text collection. Analysis. The analysis is limited to the following criteria: interface design, metadata, search options, filtering and search results. Results. Currently, it is not possible for a user to retrieve words with specific properties from a text collection. Conclusion. An extended set of metadata should be used to encode text to enable retrieval of words on a fine-grained level.

2019 ◽  
Author(s):  
Colin Vize ◽  
Katherine Collison ◽  
Donald Lynam ◽  
Josh Miller

Objective: Partialing procedures are frequently used in psychological research. The present study sought to further explore the consequences of partialing, focusing on the replicability of partialing-based results. Method: We used popular measures of the Dark Triad (DT; Machiavellianism, narcissism, and psychopathy) to explore the replicability of partialing procedures. We examined whether the residual content of popular DT scales are similar to the residual content of DT scales derived from separate samples based on relations with individual items from the IPIP-NEO-120, allowing for a fine-grained analysis of residual variable content. Results: Profiles were compared using three sample sizes (Small N=156-157, Moderate N = 313-314, Large N = 627-628) randomly drawn from a large MTurk sample (N = 1,255). There was low convergence among original/residual DT scales within samples. Additionally, results showed the content of residual Dirty Dozen scales was not similar across samples. Similar results were found for Short Dark Triad-Machiavellianism, but only in the moderate and small samples. Conclusion: The results indicate that there are important issues that arise when using partialing procedures, including replicability issues surrounding residual variables. Reasons for the observed results are discussed and further research examining the replicability of residual-based results is recommended.


2021 ◽  
Vol 2 (2) ◽  
pp. 54-63
Author(s):  
Endang Rusyaman ◽  
Devi Munandar ◽  
Diah Chaerani ◽  
Dwindi Agryanti Johar ◽  
Rizky Ashgi

In solving mathematical problems so far, Riemann's integral theory is quite adequate for solving pure mathematics and applications problems. But not all problems can be solved using this integration, such as a discontinuous function that is not Riemann's integration. Lebesgue integral is an integration concept based on measure and can solve finite and unlimited function problems and be solved in a more general set domain. One of the bases of this integration is the Lebesgues measure includes the set of real numbers, where the length of the interval is the endpoints. The alternative use of this integral is widely used in various studies such as partial differential equations, quantum mechanics, and probabilistic analysis, requiring the integration of arbitrary set functions. This paper will show a comprehensive bibliometric survey of peer-reviewed articles referring to Lebesgue measure in integration. Search results are obtained 832 papers in the google scholar database and 997 papers using Lebesgue measure integration in optimization. It can also be seen that the research have 4 clusters and 3 clusters respectively with scattered keywords for each cluster. Finally, using bibliographic data can be obtained Lebesgues measure in integration and optimization supports many of the research and provides productive citations to citing the study.


Symmetry ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 1103
Author(s):  
Yue Song ◽  
Minjuan Wang ◽  
Wanlin Gao

In order to improve the retrieval results of digital agricultural text information and improve the efficiency of retrieval, the method for searching digital agricultural text information based on local matching is proposed. The agricultural text tree and the query tree are constructed to generate the relationship of ancestor–descendant in the query and map it to the agricultural text. According to the retrieval method of the local matching, the vector retrieval method is used to calculate the digital agricultural text and submit the similarity between the queries. The similarity is sorted from large to small so that the agricultural text tree can output digital agricultural text information in turn. In the case of adding interference information, the recall rate and precision rate of the proposed method are above 99.5%; the average retrieval time is between 4s and 6s, and the average retrieval efficiency is above 99%. The proposed method is more efficient in information retrieval and can obtain comprehensive and accurate search results, which can be used for the rapid retrieval of digital agricultural text information.


2015 ◽  
Vol 14 (01) ◽  
pp. 1550007 ◽  
Author(s):  
Paul Hugh Cleverley ◽  
Simon Burnett

Categories or tags that appear in faceted search interfaces which are representative of an information item, rarely convey unexpected or non-obvious associated concepts buried within search results. No prior research has been identified which assesses the usefulness of discriminative search term word co-occurrence to generate facets to act as catalysts to facilitate insightful and serendipitous encounters during exploratory search. In this study, 53 scientists from two organisations interacted with semi-interactive stimuli, 74% expressing a large/moderate desire to use such techniques within their workplace. Preferences were shown for certain algorithms and colour coding. Insightful and serendipitous encounters were identified. These techniques appear to offer a significant improvement over existing approaches used within the study organisations, providing further evidence that insightful and serendipitous encounters can be facilitated in the search user interface. This research has implications for organisational learning, knowledge discovery and exploratory search interface design.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Hongwei Zhao ◽  
Danyang Zhang ◽  
Jiaxin Wu ◽  
Pingping Liu

Fine-grained retrieval is one of the complex problems in computer vision. Compared with general content-based image retrieval, fine-grained image retrieval faces more difficult challenges. In fine-grained image retrieval tasks, all classes belong to a subclass of a meta-class, so there will be small interclass variance and large intraclass variance. In order to solve this problem, in this paper, we propose a fine-grained retrieval method to improve loss and feature aggregation, which can achieve better retrieval results under a unified framework. Firstly, we propose a novel multiproxies adaptive distribution loss which can better characterize the intraclass variations and the degree of dispersion of each cluster center. Secondly, we propose a weakly supervised feature aggregation method based on channel weighting, which distinguishes the importance of different feature channels to obtain more representative image feature descriptors. We verify the performance of our proposed method on the universal benchmark datasets such as CUB200-2011 and Stanford Dog. Higher Recall@K demonstrates the advantage of our proposed method over the state of the art.


Author(s):  
Edoardo Nicolò Aiello ◽  
Chiara Gramegna ◽  
Antonella Esposito ◽  
Valentina Gazzaniga ◽  
Stefano Zago ◽  
...  

Abstract Background The availability of fine-grained, culture-specific psychometric outcomes can favor the interpretation of scores of the Montreal Cognitive Assessment (MoCA), the most frequently used instrument to screen for mild cognitive dysfunctions in both instrumental and non-instrumental domains. This study thus aimed at providing: (i) updated, region-specific norms for the Italian MoCA, by also (ii) comparing them to pre-existing ones with higher geographical coverage; (iii) information on sensitivity and discriminative capability at the item level. Methods Five hundred and seventy nine healthy individuals from Northern Italy (208 males, 371 females; age: 63.4 ± 15, 21–96; education: 11.3 ± 4.6, 1–25) were administered the MoCA. Item Response Theory (IRT) was adopted to assess item difficulty and discrimination. Normative values were derived by means of the Equivalent Scores (ESs) method, applied to the MoCA and its sub-scales. Average ESs were also computed. Agreement with previous ESs classification was assessed via Cohen’s k. Results Age and education significantly predicted all MoCA measures except for Orientation, which was related to age only. No sex differences were detected when tested along with age and education. Substantial disagreements with previous ESs classifications were detected. Several items proved to be scarcely sensitive, especially the place item from Orientation and the letter detection task. Memory items showed high discriminative capability, along with certain items assessing executive functions and orientation. Discussion Item-level information herewith provided for the Italian MoCA can help interpret its scores by Italian practitioners. Italian practitioners should consider an adaptive use of region-specific norms for the MoCA.


Author(s):  
Huizhao Wang ◽  
Guanfeng Liu ◽  
An Liu ◽  
Zhixu Li ◽  
Kai Zheng

The conventional methods for the next-item recommendation are generally based on RNN or one- dimensional attention with time encoding. They are either hard to preserve the long-term dependencies between different interactions, or hard to capture fine-grained user preferences. In this paper, we propose a Double Most Relevant Attention Network (DMRAN) that contains two layers, i.e., Item level Attention and Feature Level Self- attention, which are to pick out the most relevant items from the sequence of user’s historical behaviors, and extract the most relevant aspects of relevant items, respectively. Then, we can capture the fine-grained user preferences to better support the next-item recommendation. Extensive experiments on two real-world datasets illustrate that DMRAN can improve the efficiency and effectiveness of the recommendation compared with the state-of-the-art methods.


Author(s):  
Truus Kruyt

This paper discusses the advantages of encoded digital text over printed text,from a researcher's perspective. The traditional notion of text corpus as a well-consideredcollection of texts is related to the huge amounts of digital textsthat are currently available on the web. After examples of useful digitalizationinitiatives and available digital resources, information is given about the usersand uses of the text corpora stored at the lnstitute for Dutch Lexicology.Attention is paid to some obstacles in building or using text collections. Theconclusion is that up till now the digital medium primarily facilitates researchrather than evokes new linguistic research questions.


Author(s):  
Christian Mair

The first part of the chapter surveys the traditional corpus-linguistic working environment for the study of varieties of English around the world, focussing on the International Corpus of English (ICE) as a major cooperative venture. The second part assesses the potential of larger digital text archives and the World-Wide Web as additional sources of data for the study of World Englishes. Corpora and digital text databases not only serve as rich and convenient sources of data, but also encourage a specific corpus-linguistic “take” on World Englishes and are thus also important for advancing the theoretical debate in the field. Analysis of traditional corpora has deepened our understanding of the nature of morphosyntactic variation in World Englishes as a whole and of fine-grained variety-internal variablility determined by medium (spoken vs. written), genre and style, whereas the use of Web data frequently highlights the role of standard and nonstandard Englishes in transnational and global domains. The chapter ends with a plea to develop corpora documenting World Englishes in their multilingual settings and thus bring together research on World Englishes and the related field of the sociolinguistics of globalisation.


Sign in / Sign up

Export Citation Format

Share Document