Processing Tools for Greek and Other Languages of the Christian Middle East

Bastien Kindt

doi:10.46298/jdmdh.4184

Processing Tools for Greek and Other Languages of the Christian Middle East

Journal of Data Mining & Digital Humanities ◽

10.46298/jdmdh.4184 ◽

2018 ◽

Vol Special Issue on... (Project presentations) ◽

Author(s):

Bastien Kindt

Keyword(s):

Information Retrieval ◽

Middle East ◽

Text Processing ◽

Linguistic Information ◽

Computer Tools ◽

Linguistic Resources ◽

Automated Processing

This paper presents some computer tools and linguistic resources of the GREgORI project. These developments allow automated processing of texts written in the main languages of the Christian Middel East, such as Greek, Arabic, Syriac, Armenian and Georgian. The main goal is to provide scholars with tools (lemmatized indexes and concordances) making corpus-based linguistic information available. It focuses on the questions of text processing, lemmatization, information retrieval, and bitext alignment.

Download Full-text

Fast text processing for information retrieval

10.3115/112405.112480 ◽

1991 ◽

Cited By ~ 2

Author(s):

Tomek Strzalkowski ◽

Barabara Vauthey

Keyword(s):

Information Retrieval ◽

Text Processing

Download Full-text

Analysis and Assessment of Cross-Language Question Answering Systems

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch388 ◽

2018 ◽

pp. 4471-4479

Author(s):

Juncal Gutiérrez-Artacho ◽

María-Dolores Olvera-Lobo

Keyword(s):

Information Retrieval ◽

Applied Research ◽

Question Answering ◽

General Domain ◽

Linguistic Resources ◽

Question Answering Systems ◽

Cross Language ◽

Language Question ◽

The Web

Within the sphere of the Web, the overload of information is more notable than in other contexts. Question answering systems (QAS) are presented as an alternative to the traditional Information Retrieval (IR) systems, seeking to offer precise and understandable answers to factual questions instead of showing the user a list of documents related to a given search . Given that the QAS is presented as a substantial advance in the improvement of IR, it becomes necessary to determine its effectiveness for the final user. With this aim, 7 studies were undertaken to evaluate: a) in the first two, the linguistic resources and tools used in these systems for multilingual retrieval (Research 1; Research 2); and b) the performance and quality of the answers of the main monolingual and multilingual QA of general domain and specialized domain in the Web in response to different types of questions and subjects, so that different evaluation means can be applied (Research 3, Research 4, Research 5, Research 6, Research 7).

Download Full-text

Towards a Possibilistic Information Retrieval System Using Semantic Query Expansion

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch014 ◽

2012 ◽

pp. 216-242

Author(s):

Bilel Elayeb ◽

Ibrahim Bounhas ◽

Oussama Ben Khiroun ◽

Fabrice Evrard ◽

Narjès Bellamine-BenSaoud

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Retrieval System ◽

Possibility Theory ◽

Information Retrieval System ◽

Test Collection ◽

Expansion Process ◽

Semantic Query ◽

Linguistic Resources ◽

Relevance Measure

This paper presents a new possibilistic information retrieval system using semantic query expansion. The work is involved in query expansion strategies based on external linguistic resources. In this case, the authors exploited the French dictionary “Le Grand Robert”. First, they model the dictionary as a graph and compute similarities between query terms by exploiting the circuits in the graph. Second, the possibility theory is used by taking advantage of a double relevance measure (possibility and necessity) between the articles of the dictionary and query terms. Third, these two approaches are combined by using two different aggregation methods. The authors also benefit from an existing approach for reweighting query terms in the possibilistic matching model to improve the expansion process. In order to assess and compare the approaches, the authors performed experiments on the standard ‘LeMonde94’ test collection.

Download Full-text

Analysis and Assessment of Cross-Language Question Answering Systems

Advances in Library and Information Science - Advanced Methodologies and Technologies in Library Science, Information Management, and Scholarly Inquiry ◽

10.4018/978-1-5225-7659-4.ch018 ◽

2019 ◽

pp. 226-236

Author(s):

Juncal Gutiérrez-Artacho ◽

María-Dolores Olvera-Lobo

Keyword(s):

Information Retrieval ◽

Applied Research ◽

Question Answering ◽

General Domain ◽

Linguistic Resources ◽

Question Answering Systems ◽

Cross Language ◽

Language Question ◽

The Web

Within the sphere of the web, the overload of information is more notable than in other contexts. Question answering systems (QAS) are presented as an alternative to the traditional information retrieval (IR) systems seeking to offer precise and understandable answers to factual questions instead of showing the user a list of documents related to a given search. Given that the QAS is presented as a substantial advance in the improvement of IR, it becomes necessary to determine its effectiveness for the final user. With this aim, seven studies were undertaken to evaluate: 1) in the first two, the linguistic resources and tools used in these systems for multilingual retrieval (Research 1, Research 2), and 2) the performance and quality of the answers of the main monolingual and multilingual QA of general domain and specialized domain in the web in response to different types of questions and subjects, so that different evaluation means can be applied (Research 3, Research 4, Research 5, Research 6, Research 7).

Download Full-text

Verification of Uncurated Protein Annotations

Information Retrieval in Biomedicine ◽

10.4018/978-1-60566-274-9.ch016 ◽

2010 ◽

pp. 301-314

Author(s):

Francisco M. Couto ◽

Mário J. Silva ◽

Vivian Lee ◽

Emily Dimmer ◽

Evelyn Camon ◽

...

Keyword(s):

Information Retrieval ◽

Molecular Biology ◽

Domain Knowledge ◽

Text Processing ◽

The Other ◽

Research Projects ◽

Evidence Text ◽

Biological Sources ◽

Biology Research ◽

Automatic Text

Molecular Biology research projects produced vast amounts of data, part of which has been preserved in a variety of public databases. However, a large portion of the data contains a significant number of errors and therefore requires careful verification by curators, a painful and costly task, before being reliable enough to derive valid conclusions from it. On the other hand, research in biomedical information retrieval and information extraction are nowadays delivering Text Mining solutions that can support curators to improve the efficiency of their work to deliver better data resources. Over the past decades, automatic text processing systems have successfully exploited biomedical scientific literature to reduce the researchers’ efforts to keep up to date, but many of these systems still rely on domain knowledge that is integrated manually leading to unnecessary overheads and restrictions in its use. A more efficient approach would acquire the domain knowledge automatically from publicly available biological sources, such as BioOntologies, rather than using manually inserted domain knowledge. An example of this approach is GOAnnotator, a tool that assists the verification of uncurated protein annotations. It provided correct evidence text at 93% precision to the curators and thus achieved promising results. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.

Download Full-text

An Evaluation of Computer-Supported Backtracking in a Hierarchical Database

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193129203600420 ◽

1992 ◽

Vol 36 (4) ◽

pp. 356-360 ◽

Cited By ~ 4

Author(s):

Cortney G. Vargo ◽

Clifford E. Brown ◽

Sarah J. Swierenga

Keyword(s):

Information Retrieval ◽

Multiple Choice ◽

Ease Of Use ◽

Direct Manipulation ◽

Multiple Choice Questions ◽

Retrieval Task ◽

Computer Tool ◽

Computer Tools ◽

Independent Variables ◽

Subjective Preference

This study was designed to investigate whether computer-supported backtracking tools reduced navigation time over manual backtracking and to compare navigation times among a subset of four backtracking tools. Each tool was evaluated in the context of an experimental, hierarchical, direct-manipulation database. Trials consisted of an information retrieval task requiring subjects to answer multiple-choice questions about the contents of the database. The independent variables included the backtracking tool and the backtrack navigation Task Length. The dependent measures included navigation time, the frequency with which the computer tool was selected and used over manual backtracking (a Table of Contents), and questionnaire responses. Backtracking with any of the four computer-supported tools resulted in a significantly reduced navigation time over manual backtracking using the Table of Contents. When provided with a history list, subjects had significantly smaller navigation times when backtracking at the higher of two levels in the database hierarchy. There were no differences between computer tools in rated efficiency, ease of use, or objective or subjective preference measures.

Download Full-text

A common architecture for different text processing techniques in an information retrieval environment

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '86 ◽

10.1145/253168.253198 ◽

1986 ◽

Cited By ~ 1

Author(s):

G. Thurmair

Keyword(s):

Information Retrieval ◽

Text Processing ◽

Processing Techniques ◽

Common Architecture

Download Full-text

Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-formal Indonesian Conversation

10.21203/rs.3.rs-41431/v2 ◽

2020 ◽

Author(s):

Rianto Rianto ◽

Achmad Benny Mutiara ◽

Eri Prasetyo Wibowo ◽

Paulus Insap Santosa

Keyword(s):

Support Vector Machine ◽

Information Retrieval ◽

Text Classification ◽

Experimental Evaluation ◽

Hate Speech ◽

Text Processing ◽

High Accuracy ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Text Data

Abstract Stemming has long been used in data pre-processing in information retrieval, which aims to make affix words into root words. However, there are not many stemming methods for non-formal Indonesian text processing. The existing stemming method has high accuracy for formal Indonesian, but low for non-formal Indonesian. Thus, the stemming method which has high accuracy for non-formal Indonesian classifier model is still an open-ended challenge. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to provide comprehensive research on improving the accuracy of text classifier models by strengthening on stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. In the future, the proposed stemming method can be used to develop the Indonesian text classifier model which can be used for various purposes including text clustering, summarization, detecting hate speech, and other text processing applications.

Download Full-text

Robust text processing and information retrieval

Proceedings of the workshop on Human Language Technology - HLT '93 ◽

10.3115/1075671.1075787 ◽

1993 ◽

Author(s):

Tomek Strzalkowski

Keyword(s):

Information Retrieval ◽

Text Processing

Download Full-text

An Empirical Prediction Methodology for the Emotional Behaviors with the Impact of Musical Features

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f3336.049620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 646-649

Keyword(s):

Feature Extraction ◽

Information Retrieval ◽

Real Situation ◽

Linguistic Information ◽

Empirical Prediction ◽

Current State ◽

Pragmatic Analysis ◽

State Of Mind ◽

Music Information ◽

The Impact

Music is the combination of melody, linguistic information and singer’s mental realm. As popularity of music increases, the choice of songs also varies according to their mental conditions. The mental conditions reach the supreme bliss to melancholy strain based on the musical notes. Majority mostly prefer songs, which satisfy their current state of mind. Pragmatic analysis in music by computer is a difficult task, as emotion is very complex and it camouflages the real situation. Hence, In this paper , trying to classify the songs based on the features of music which helps to classify the emotion more easily. Music feature extraction is done using Music Information Retrieval (MIR) toolbox. The dataset consists of 100 of Hindi songs of 30 seconds clip and later classify the emotion based on Naïve Bayes classification method using Weka API.

Download Full-text