Development of parser agents for bibliographic data retrieval

Murari Kumar; Samir Farooqi; K. K. Chaturvedi; Chandan Kumar Deb; Pankaj Das

doi:10.18805/bkap141

Development of parser agents for bibliographic data retrieval

Bhartiya Krishi Anusandhan Patrika ◽

10.18805/bkap141 ◽

2018 ◽

Vol 33 (4) ◽

Author(s):

Murari Kumar ◽

Samir Farooqi ◽

K. K. Chaturvedi ◽

Chandan Kumar Deb ◽

Pankaj Das

Keyword(s):

Retrieval System ◽

Data Retrieval ◽

Web Pages ◽

Bibliographic Database ◽

Bibliographic Information ◽

Open Access Journals ◽

Bibliographic Records ◽

Bibliographic Data ◽

Structured Information ◽

The Web

Bibliographic data contains necessary information about literature to help users to recognize and retrieve that resource. These data are used quantitatively by a “Bibliometrician” for analysis and dissemination purpose but with the increasing rate of literature publication in open access journals such as Nucleic Acids Research (NAR), Springer, Oxford Journals etc., it has become difficult to retrieve structured bibliographic information in desired format. A digital bibliographic database contains necessary and structured information about published literature. Bibliographic records of different articles are scattered and resides on different web pages. This thesis presents the retrieval system for bibliographic data of NAR at a single place. For this purpose, parser agents have been developed which access the web pages of NAR and parse the scattered bibliographic data and finally store it into a local bibliographic database. Based on the bibliographic database, “three-tier architecture” has been utilized to display the bibliographic information in systematized format. Using this system, it would be possible to build the network between different authors and affiliations and also other analytical reports can be generated.

Download Full-text

A FRAME WORK FOR WEB INFORMATION EXTRACTION AND ANALYSIS

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v7i2.3459 ◽

2013 ◽

Vol 7 (2) ◽

pp. 574-579 ◽

Cited By ~ 3

Author(s):

Dr Sunitha Abburu ◽

G. Suresh Babu

Keyword(s):

Information Extraction ◽

Data Extraction ◽

Research Work ◽

Web Pages ◽

Web Documents ◽

E Learning ◽

Structured Information ◽

Frame Work ◽

Effective Decision ◽

The Web

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.Â But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies Â data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.Â Â It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.Â The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.

Download Full-text

Slow Metadata

PMLA ◽

10.1632/pmla.2020.135.1.188 ◽

2020 ◽

Vol 135 (1) ◽

pp. 188-194

Author(s):

Rachel Sagner Buurma ◽

Jon Shaw

Keyword(s):

Book History ◽

Library Science ◽

Public Access ◽

Library Catalog ◽

Bibliographic Information ◽

Bibliographic Records ◽

Literary Character ◽

Bibliographic Data ◽

History Of ◽

Machine Readable

The Bibliographic Records in Libraries' Searchable Online Public Access Catalogs (Opac) Have Recently Taken on a New Role as a source of bibliographic data that can be aggregated, shared, circulated, manipulated, transformed, studied, and interpreted. Scholars' new awareness of library catalogs not just as aids to locating books and other materials but as sources of bibliographic information that researchers can manipulate and transform has inspired new scholarship on the history of the catalog and a new focus on how the catalog, in both its analog and digital forms, shapes bibliographic knowledge. Our Early Novels Dataset (END) project, for example, uses methods from book history, library science, and literary studies to think about the shape and history of the bibliographic metadata in the library catalog. Our research group's collective experiments with bibliographic metadata ask what happens when we look at the library catalog record not just as a utilitarian aid for searching or as an object of critique, but also as a work in progress with a literary character of its own. We ask what we can learn from the shape given to bibliographic information by the earlier catalogers whose records our project inherited and on whose expertise we draw. We also ask how the familiar languages of the library catalog record and the controlled bibliographic description might help make new forms of knowledge about books. And we press on the inevitable and generative tension between the particular perspective of the library catalogers who transform specific copies of physical books into bibliographic data and the informational fields dictated by machine-readable cataloging (MARC) descriptive standards.

Download Full-text

Bibliograficzna baza danych światowego językoznawstwa slawistycznego iSybislaw jako źródło badań nad ekwiwalencją polskich i ukraińskich terminów gramatycznych

Slavica Wratislaviensia ◽

10.19195/0137-1150.170.7 ◽

2019 ◽

Vol 170 ◽

pp. 85-96

Author(s):

Iwona Łuczków

Keyword(s):

Retrieval System ◽

Bibliographic Database ◽

Slavic Linguistics ◽

Bibliographic Data ◽

The Subject

Bibliographic Database of World Slavic Linguistics Publications iSybislaw as a source of research on the equivalence of Polish and Ukrainian grammatical termsBibliographic Database of World Slavic Linguistics Publications iSybislaw is a modern informative retrieval system registering bibliographic data of Slavonic linguistics. The object of analysis in this paper is a set of keywords equiform with the terms understood traditionally. The subject of the article is to provide general comments on the problem of determining the equivalence of keywords using the example of Polish and Ukrainian grammatical terms. Библиографическая база данных по мировому славянскому языкознанию iSybislaw как источник исследования эквивалентности польских и украинских грамматических терминовБиблиографическая база данных iSybislaw представляет собой современную информационно-поисковую систему, в которой содержится информация о работах по мировому славянскому языкознанию. Предметом анализа в настоящей статье является один из компонентов системы — ключевые слова, совпадающие с традиционно понимаемыми грамматическими терминами. В статье представлены проблемы, связанные с определением межъязыковых эквивалентов ключевых слов на примере польских и украинских грамматических терминов.

Download Full-text

Governance Disclosure on the Internet by Leading Indian Public Sector Companies

Think India ◽

10.26643/think-india.v22i2.8716 ◽

2019 ◽

Vol 22 (2) ◽

pp. 174-187

Author(s):

Harmandeep Singh ◽

Arwinder Singh

Keyword(s):

Public Sector ◽

Financial Information ◽

Three Dimensions ◽

The Internet ◽

Web Pages ◽

Non Profit ◽

Index Approach ◽

Significant Difference ◽

Governance Disclosure ◽

The Web

Nowadays, internet satisfying people with different services related to different fields. The profit, as well as non-profit organization, uses the internet for various business purposes. One of the major is communicated various financial as well as non-financial information on their respective websites. This study is conducted on the top 30 BSE listed public sector companies, to measure the extent of governance disclosure (non-financial information) on their web pages. The disclosure index approach to examine the extent of governance disclosure on the internet was used. The governance index was constructed and broadly categorized into three dimensions, i.e., organization and structure, strategy & Planning and accountability, compliance, philosophy & risk management. The empirical evidence of the study reveals that all the Indian public sector companies have a website, and on average, 67% of companies disclosed some kind of governance information directly on their websites. Further, we found extreme variations in the web disclosure between the three categories, i.e., The Maharatans, The Navratans, and Miniratans. However, the result of Kruskal-Wallis indicates that there is no such significant difference between the three categories. The study provides valuable insights into the Indian economy. It explored that Indian public sector companies use the internet for governance disclosure to some extent, but lacks symmetry in the disclosure. It is because there is no such regulation for web disclosure. Thus, the recommendation of the study highlighted that there must be such a regulated framework for the web disclosure so that stakeholders ensure the transparency and reliability of the information.

Download Full-text

Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1913 ◽

2020 ◽

Vol 4 (3) ◽

pp. 551-557

Author(s):

Muhammad zaky ramadhan ◽

Kemas Muslim Lhaksmana

Keyword(s):

Retrieval System ◽

Islamic Law ◽

Vector Space Model ◽

Document Retrieval ◽

The Internet ◽

Average Precision ◽

Spelling Correction ◽

Space Model ◽

The Mean ◽

The Web

Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

Download Full-text

A Method of Opinion Tendency Analyzing Based on Regular Expression

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2758 ◽

2013 ◽

Vol 347-350 ◽

pp. 2758-2762

Author(s):

Zhi Juan Wang

Keyword(s):

Regular Expression ◽

Negative Information ◽

Web Pages ◽

Social Stability ◽

National Unity ◽

Internet Information ◽

Information Bank ◽

The Web

Negative Internet information is harmful for social stability and national unity. Opinion tendency analyzing can find the negative Internet information. Here, a method based on regular expression is introduces that neednt complex technologies about semantics. This method includes: building negative information bank, designing regular expression and the realization of program. The result gotten from this method verified it works perfect on judging the opinion of the web pages.

Download Full-text