scholarly journals Development of parser agents for bibliographic data retrieval

2018 ◽  
Vol 33 (4) ◽  
Author(s):  
Murari Kumar ◽  
Samir Farooqi ◽  
K. K. Chaturvedi ◽  
Chandan Kumar Deb ◽  
Pankaj Das

Bibliographic data contains necessary information about literature to help users to recognize and retrieve that resource. These data are used quantitatively by a “Bibliometrician” for analysis and dissemination purpose but with the increasing rate of literature publication in open access journals such as Nucleic Acids Research (NAR), Springer, Oxford Journals etc., it has become difficult to retrieve structured bibliographic information in desired format. A digital bibliographic database contains necessary and structured information about published literature. Bibliographic records of different articles are scattered and resides on different web pages. This thesis presents the retrieval system for bibliographic data of NAR at a single place. For this purpose, parser agents have been developed which access the web pages of NAR and parse the scattered bibliographic data and finally store it into a local bibliographic database. Based on the bibliographic database, “three-tier architecture” has been utilized to display the bibliographic information in systematized format. Using this system, it would be possible to build the network between different authors and affiliations and also other analytical reports can be generated.

2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


PMLA ◽  
2020 ◽  
Vol 135 (1) ◽  
pp. 188-194
Author(s):  
Rachel Sagner Buurma ◽  
Jon Shaw

The Bibliographic Records in Libraries' Searchable Online Public Access Catalogs (Opac) Have Recently Taken on a New Role as a source of bibliographic data that can be aggregated, shared, circulated, manipulated, transformed, studied, and interpreted. Scholars' new awareness of library catalogs not just as aids to locating books and other materials but as sources of bibliographic information that researchers can manipulate and transform has inspired new scholarship on the history of the catalog and a new focus on how the catalog, in both its analog and digital forms, shapes bibliographic knowledge. Our Early Novels Dataset (END) project, for example, uses methods from book history, library science, and literary studies to think about the shape and history of the bibliographic metadata in the library catalog. Our research group's collective experiments with bibliographic metadata ask what happens when we look at the library catalog record not just as a utilitarian aid for searching or as an object of critique, but also as a work in progress with a literary character of its own. We ask what we can learn from the shape given to bibliographic information by the earlier catalogers whose records our project inherited and on whose expertise we draw. We also ask how the familiar languages of the library catalog record and the controlled bibliographic description might help make new forms of knowledge about books. And we press on the inevitable and generative tension between the particular perspective of the library catalogers who transform specific copies of physical books into bibliographic data and the informational fields dictated by machine-readable cataloging (MARC) descriptive standards.


2019 ◽  
Vol 170 ◽  
pp. 85-96
Author(s):  
Iwona Łuczków

Bibliographic Database of World Slavic Linguistics Publications iSybislaw as a source of research on the equivalence of Polish and Ukrainian grammatical termsBibliographic Database of World Slavic Linguistics Publications iSybislaw is a modern informative retrieval system registering bibliographic data of Slavonic linguistics. The object of analysis in this paper is a set of keywords equiform with the terms understood traditionally. The subject of the article is to provide general comments on the problem of determining the equivalence of keywords using the example of Polish and Ukrainian grammatical terms. Библиографическая база данных по мировому славянскому языкознанию iSybislaw как источник исследования эквивалентности польских и украинских грамматических терминовБиблиографическая база данных iSybislaw представляет собой современную информационно-поисковую систему, в которой содержится информация о работах по мировому славянскому языкознанию. Предметом анализа в настоящей статье является один из компонентов системы — ключевые слова, совпадающие с традиционно понимаемыми грамматическими терминами. В статье представлены проблемы, связанные с определением межъязыковых эквивалентов ключевых слов на примере польских и украинских грамматических терминов.


Think India ◽  
2019 ◽  
Vol 22 (2) ◽  
pp. 174-187
Author(s):  
Harmandeep Singh ◽  
Arwinder Singh

Nowadays, internet satisfying people with different services related to different fields. The profit, as well as non-profit organization, uses the internet for various business purposes. One of the major is communicated various financial as well as non-financial information on their respective websites. This study is conducted on the top 30 BSE listed public sector companies, to measure the extent of governance disclosure (non-financial information) on their web pages. The disclosure index approach to examine the extent of governance disclosure on the internet was used. The governance index was constructed and broadly categorized into three dimensions, i.e., organization and structure, strategy & Planning and accountability, compliance, philosophy & risk management. The empirical evidence of the study reveals that all the Indian public sector companies have a website, and on average, 67% of companies disclosed some kind of governance information directly on their websites. Further, we found extreme variations in the web disclosure between the three categories, i.e., The Maharatans, The Navratans, and Miniratans. However, the result of Kruskal-Wallis indicates that there is no such significant difference between the three categories. The study provides valuable insights into the Indian economy. It explored that Indian public sector companies use the internet for governance disclosure to some extent, but lacks symmetry in the disclosure. It is because there is no such regulation for web disclosure. Thus, the recommendation of the study highlighted that there must be such a regulated framework for the web disclosure so that stakeholders ensure the transparency and reliability of the information.


2020 ◽  
Vol 4 (3) ◽  
pp. 551-557
Author(s):  
Muhammad zaky ramadhan ◽  
Kemas Muslim Lhaksmana

Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.


2013 ◽  
Vol 347-350 ◽  
pp. 2758-2762
Author(s):  
Zhi Juan Wang

Negative Internet information is harmful for social stability and national unity. Opinion tendency analyzing can find the negative Internet information. Here, a method based on regular expression is introduces that neednt complex technologies about semantics. This method includes: building negative information bank, designing regular expression and the realization of program. The result gotten from this method verified it works perfect on judging the opinion of the web pages.


Sign in / Sign up

Export Citation Format

Share Document