The Arabic ontology – an Arabic wordnet with ontologically clean content

2020 ◽  
pp. 1-26 ◽  
Author(s):  
Mustafa Jarrar

We present a formal Arabic wordnet built on the basis of a carefully designed ontology hereby referred to as the Arabic Ontology. The ontology provides a formal representation of the concepts that the Arabic terms convey, and its content was built with ontological analysis in mind, and benchmarked to scientific advances and rigorous knowledge sources as much as this is possible, rather than to only speakers’ beliefs as lexicons typically are. A comprehensive evaluation was conducted thereby demonstrating that the current version of the top-levels of the ontology can top the majority of the Arabic meanings. The ontology consists currently of about 1,800 well-investigated concepts in addition to 16,000 concepts that are partially validated. The ontology is accessible and searchable through a lexicographic search engine (http://ontology.birzeit.edu) that also includes about 150 Arabic-multilingual lexicons, and which are being mapped and enriched using the ontology. The ontology is fully mapped with Princeton WordNet, Wikidata, and other resources.

Author(s):  
N. N. Leontyeva ◽  
◽  
M. V. Ermakov ◽  
S. A. Krylov ◽  
S. Yu. Semenova ◽  
...  

The paper deals with upgrading of an electronic semantic dictionary of RUSLAN for automatic processing of Russian texts. The previous versions of the dictionary were created in the 1990-es and early 2000-es mainly for automatic processing of the Russian Federation’s state papers. Now the Authors inherit the basic formalism of the Dictionary, including the metalanguage and the structure of the dictionary entry. The current version is revised and enlarged in a number of ways. While the initial versions mostly predate the advent of corpus linguistics, the current version is based on corpus data. The Russian National Corpus was used as a source of sample sentences, as well as for determining statistically and empirically which linguistic information is pragmatically relevant. A structural representation for the sample sentences was designed, and a procedure for selecting lexical units from the corpus to use in a pragmatic description of polysemy. A formal representation of situations, previously outlined in the works of Nina N. Leontyeva, has also been detailed and largely realized. Among the lexicon, verbs in particular have received a more flexible description compared to the previous versions, and aspectual meanings are reflected with more nuance.


2014 ◽  
Vol 05 (03) ◽  
pp. 731-745 ◽  
Author(s):  
J. Belden ◽  
J. Williams ◽  
B. Richardson ◽  
K. Schuster ◽  
D. Saparova

SummaryBackground: Federated medical search engines are health information systems that provide a single access point to different types of information. Their efficiency as clinical decision support tools has been demonstrated through numerous evaluations. Despite their rigor, very few of these studies report holistic evaluations of medical search engines and even fewer base their evaluations on existing evaluation frameworks.Objectives: To evaluate a federated medical search engine, MedSocket, for its potential net benefits in an established clinical setting.Methods: This study applied the Human, Organization, and Technology (HOT-fit) evaluation framework in order to evaluate MedSocket. The hierarchical structure of the HOT-factors allowed for identification of a combination of efficiency metrics. Human fit was evaluated through user satisfaction and patterns of system use; technology fit was evaluated through the measurements of time-on-task and the accuracy of the found answers; and organization fit was evaluated from the perspective of system fit to the existing organizational structure.Results: Evaluations produced mixed results and suggested several opportunities for system improvement. On average, participants were satisfied with MedSocket searches and confident in the accuracy of retrieved answers. However, MedSocket did not meet participants’ expectations in terms of download speed, access to information, and relevance of the search results. These mixed results made it necessary to conclude that in the case of MedSocket, technology fit had a significant influence on the human and organization fit. Hence, improving technological capabilities of the system is critical before its net benefits can become noticeable.Conclusions: The HOT-fit evaluation framework was instrumental in tailoring the methodology for conducting a comprehensive evaluation of the search engine. Such multidimensional evaluation of the search engine resulted in recommendations for system improvement.Citation: Saparova D, Belden J, Williams J, Richardson B, Schuster K. Evaluating a federated medical search engine: Tailoring the methodology and reporting the evaluation outcomes. Appl Clin Inf 2014; 5: 731–745http://dx.doi.org/10.4338/ACI-2014-03-RA-0021


2020 ◽  
Vol 8 ◽  
pp. 572-588
Author(s):  
Kyle Richardson ◽  
Ashish Sabharwal

Open-domain question answering (QA) involves many knowledge and reasoning challenges, but are successful QA models actually learning such knowledge when trained on benchmark QA tasks? We investigate this via several new diagnostic tasks probing whether multiple-choice QA models know definitions and taxonomic reasoning—two skills widespread in existing benchmarks and fundamental to more complex reasoning. We introduce a methodology for automatically building probe datasets from expert knowledge sources, allowing for systematic control and a comprehensive evaluation. We include ways to carefully control for artifacts that may arise during this process. Our evaluation confirms that transformer-based multiple-choice QA models are already predisposed to recognize certain types of structural linguistic knowledge. However, it also reveals a more nuanced picture: their performance notably degrades even with a slight increase in the number of “hops” in the underlying taxonomic hierarchy, and with more challenging distractor candidates. Further, existing models are far from perfect when assessed at the level of clusters of semantically connected probes, such as all hypernym questions about a single concept.


2003 ◽  
Vol 62 (2) ◽  
pp. 121-129 ◽  
Author(s):  
Astrid Schütz ◽  
Franz Machilek

Research on personal home pages is still rare. Many studies to date are exploratory, and the problem of drawing a sample that reflects the variety of existing home pages has not yet been solved. The present paper discusses sampling strategies and suggests a strategy based on the results retrieved by a search engine. This approach is used to draw a sample of 229 personal home pages that portray private identities. Findings on age and sex of the owners and elements characterizing the sites are reported.


2005 ◽  
Author(s):  
Frank M. Gresham ◽  
Daniel J. Reschly ◽  
Jack Fletcher ◽  
Matthew Burns ◽  
Theodore Christ ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document