Sentence Classification and Information Retrieval for Petroleum Engineering

Mapping Intimacies ◽

10.5753/eniac.2019.9331 ◽

2019 ◽

Author(s):

Thiago Ferraz ◽

Gabriel Ferreira ◽

Fábio Cozman ◽

Ismael Santos

Keyword(s):

Logistic Regression ◽

Information Retrieval ◽

Text Mining ◽

Search Engine ◽

Classification System ◽

Retrieval System ◽

Information Retrieval System ◽

Petroleum Engineering ◽

Testing Data ◽

Sentence Classification

Classifying sentences in industrial, technical or scientific reports can enhance text mining and information retrieval tasks with useful machinereadable metadata. This paper describes a search engine that employs sentence classification so as to search for abstracts from scholarly papers in Petroleum Engineering. The sentences were classified into four classes, based on the popular IMRAD categories. We produced a dataset containing more than 2,200 manually labeled sentences from 278 scholarly articles in the field of Petroleum Engineering in order to be used as training and testing data. The classifier with best results was logistic regression, with an accuracy of 86.4%. The information retrieval system built on top of the classification system yielded a mAP of 0.80.

Download Full-text

How well does Google work with Persian documents?

Journal of Information Science ◽

10.1177/0165551516640437 ◽

2016 ◽

Vol 43 (3) ◽

pp. 316-327 ◽

Cited By ~ 1

Author(s):

Mohammad Sadeghi ◽

Jesús Vegas

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Morphological Analysis ◽

Retrieval System ◽

Relevant Information ◽

Information Retrieval System ◽

Search Results ◽

Persian Language ◽

Google Search ◽

The Web

The performance evaluation of an information retrieval system is a decisive aspect of the measure of the improvements in search technology. The Google search engine, as a tool for retrieving information on the Web, is used by almost 92% of Iranian users. The purpose of this paper is to study Google’s performance in retrieving relevant information from Persian documents. The information retrieval effectiveness is based on the precision measures of the search results done to a website that we have built with the documents of a TREC standard corpus. We asked Google for 100 topics available on the corpus and we compared the retrieved webpages with the relevant documents. The obtained results indicated that the morphological analysis of the Persian language is not fully taken into account by the Google search engine. The incorrect text tokenisation, considering the stop words as the content keywords of a document and the wrong ‘variants encountered’ of words found by Google are the main reasons that affect the relevance of the Persian information retrieval on the Web for this search engine.

Download Full-text

Development and Integration of an Odia Stemmer in Dspace for Odia Search Engine

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1108.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 836-840

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Morphological Analysis ◽

Retrieval System ◽

Text Retrieval ◽

Information Retrieval System ◽

Brute Force ◽

Retrieval Process ◽

New Born ◽

Domain Independent

Stemmer is used for reducing inflectional or derived word to its stem. This technique involves removing the suffix or prefix affixed in a word. It can be used for information retrieval system to refine the overall execution of the retrieval process. This process is not equivalent to morphological analysis. This process only finds the stem of a word. This technique decreases the number of terms in information retrieval system. There are various techniques exists for stemming. Here a new hybrid stemmer has developed named as “Mula” for Odia Language. It is a combination of brute force and enhanced suffix strippingapproach for Odia language. The new born stemmer is both computationally inexpensive and domain independent. We have integrated this stemmer in existing Dspace for Odia text retrieval System. The results are commendable and suggest that the new stemmer can be used effectively in Odia Search Engine. The proposed stemmer also handles over-stemming and understemming effectively

Download Full-text

Neuro-Fuzzy Based Integrated and Optimized Search Engine for Effective and Reliable Information Retrieval System

Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies - ICTCS '14 ◽

10.1145/2677855.2677920 ◽

2014 ◽

Author(s):

S. R. Tandan ◽

Priyanka Tripathi ◽

Rohit Miri

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Retrieval System ◽

Information Retrieval System ◽

Reliable Information ◽

Neuro Fuzzy

Download Full-text

Text Mining Research Based on Intelligent Computing in Information Retrieval System

TELKOMNIKA (Telecommunication Computing Electronics and Control) ◽

10.12928/telkomnika.v13i4.1788 ◽

2015 ◽

Vol 13 (4) ◽

pp. 1384 ◽

Cited By ~ 1

Author(s):

Yong Li

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Retrieval System ◽

Information Retrieval System ◽

Intelligent Computing

Download Full-text

Improving information retrieval system performance by combining different text-mining techniques

Intelligent Data Analysis ◽

10.3233/ida-2000-4604 ◽

2000 ◽

Vol 4 (6) ◽

pp. 489-511 ◽

Cited By ~ 2

Author(s):

Rila Mandala ◽

Takenobu Tokunaga ◽

Hozumi Tanaka

Keyword(s):

Information Retrieval ◽

Text Mining ◽

System Performance ◽

Retrieval System ◽

Information Retrieval System

Download Full-text

Information Retrieval (IR) and Extracting Associative Rules

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch037 ◽

2016 ◽

pp. 713-732

Author(s):

Asmae Dami ◽

Mohamed Fakir ◽

Belaid Bouikhalene

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Knowledge Discovery ◽

Association Rules ◽

Retrieval System ◽

Information Retrieval System ◽

User Needs ◽

Retrieval Process ◽

Research Themes ◽

Document Database

This chapter is located in the intersection of two research themes, namely: Information Retrieval and Knowledge Discovery from texts (Text mining). The purpose of this paper is two-fold: first, it focuses on Information Retrieval (IR) whose purpose is to implement a set of models and systems for selecting a set of documents satisfying user needs in terms of information expressed as a query. An information retrieval system is composed mainly of two processes the representation and retrieval process. The process of representation is called indexing, which allows representation of documents and queries by descriptors, or indexes. These descriptors reflect the contents of documents. The retrieval process consists on the comparison between documents representations and query representation. The second aim of this paper is to discover the relationships between terms (keywords) descriptors of documents in a document database. The correlations (relationships) between terms are extracted by using a technique of the Text mining, mainly association rules.

Download Full-text

Information Retrieval (IR) and Extracting Associative Rules

Journal of Information Technology Research ◽

10.4018/jitr.2014100104 ◽

2014 ◽

Vol 7 (4) ◽

pp. 42-62

Author(s):

Asmae Dami ◽

Mohamed Fakir ◽

Belaid Bouikhalene

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Knowledge Discovery ◽

Association Rules ◽

Retrieval System ◽

Information Retrieval System ◽

User Needs ◽

Retrieval Process ◽

Research Themes ◽

Document Database

This paper is located in the intersection of two research themes, namely: Information Retrieval and Knowledge Discovery from texts (Text mining). The purpose of this paper is two-fold: first, it focuses on Information Retrieval (IR) whose purpose is to implement a set of models and systems for selecting a set of documents satisfying user needs in terms of information expressed as a query. An information retrieval system is composed mainly of two processes the representation and retrieval process. The process of representation is called indexing, which allows representation of documents and queries by descriptors, or indexes. These descriptors reflect the contents of documents. The retrieval process consists on the comparison between documents representations and query representation. The second aim of this paper is to discover the relationships between terms (keywords) descriptors of documents in a document database. The correlations (relationships) between terms are extracted by using a technique of the Text mining, mainly association rules.

Download Full-text

Evaluasi Daftar Stopword Bahasa Indonesia

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2019611226 ◽

2019 ◽

Vol 6 (1) ◽

pp. 41

Author(s):

Faisal Rahutomo ◽

Ariadi Retno Tri Hayati Ririd

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Frequency Analysis ◽

Retrieval System ◽

Word Type ◽

Information Retrieval System ◽

Preparation Technique ◽

Formal Grammar ◽

Text Corpus ◽

Bahasa Indonesia

Pada sistem temu kembali informasi berbentuk teks maupun text mining, terdapat proses pengindeksan. Teks diproses dengan tujuan mengintisarikan informasi berbentuk teks tersebut. Salah satu proses yang dilakukan adalah stopword filtering, beberapa kata yang tidak layak diindeks diabaikan berdasar sebuah daftar. Di dalam sistem berbahasa Indonesia, terdapat beberapa versi daftar stopword yang tersedia bebas. Penelitian ini bertujuan mengevaluasi daftar yang telah tersedia tersebut. Tujuan akhir dari penelitian ini adalah telaah daftar yang tersedia berdasarkan tata bahasa Indonesia, cara penyusunan, dan kebiasaan perambah internet. Dari hasil telaah diperoleh fakta bahwa daftar yang tersedia dibangun dengan analisis frekuensi kemunculan kata pada sebuah korpus (corpus) teks, tanpa memperhatikan jenis kata ataupun kebiasaan pengguna internet. Hasil lain penelitian ini adalah beberapa rekomendasi lebih lanjut bagi para peneliti di bidang ini ketika membutuhkan daftar stopword bahasa Indonesia, yaitu daftar yang memperhatikan jenis kata dan kebiasaan pengguna internet melalui mesin perambah yang tersedia.AbstractMost of text-based information retrieval system uses indexing process. The system processes the texts in order to obtain the information essence. One of the process is stopword filtering, several words are being ignored based on a stopword list. Several Indonesian stopword list are available openly. Therefore, this paper evaluates the available lists based on Indonesian formal grammar, its preparation technique, and internet surfer habit. The results show all of the list are developed by term frequency analysis based on a text corpus. This paper also provides several recommendations for researcher both in text mining and text-based information retrieval field, developing stoplist by the word type and internet surfer habit.

Download Full-text

Information Retrieval System in the Form of a Powerful Search Engine

Journal of Craniofacial Surgery ◽

10.1097/00001665-200305000-00026 ◽

2003 ◽

Vol 14 (3) ◽

pp. 426

Author(s):

Mutaz B. Habal

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Retrieval System ◽

Information Retrieval System

Download Full-text

Mental Models and Information Retrieval: What Can Search Queries Tell Us?

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais495 ◽

2013 ◽

Author(s):

Haidar Moukdad ◽

Andrew Large

Keyword(s):

Information Retrieval ◽

Mental Models ◽

Search Engine ◽

Mental Model ◽

Web Search ◽

Retrieval System ◽

Information Retrieval System ◽

Search Queries ◽

Web Search Engine ◽

Advanced Search

When information seekers use an information retrieval system their strategy is based, at least in part, on the mental model they have constructed about this environment. A random sample was gathered of more than 2000 actual search queries submitted by users to one web search engine. WebCrawler, in two separate capture sessions. The results suggest that a high proportion of users do not employ advanced search features...

Download Full-text