Smart crawler for hidden web interfaces

IHWC: intelligent hidden web crawler for harvesting data in urban domains

Complex & Intelligent Systems ◽

10.1007/s40747-021-00471-1 ◽

2021 ◽

Author(s):

Sawroop Kaur ◽

Aman Singh ◽

G. Geetha ◽

Xiaochun Cheng

Keyword(s):

Large Scale ◽

Smart Cities ◽

Quality Data ◽

Web Pages ◽

Web Crawler ◽

Harvest Rate ◽

Web Interfaces ◽

Hidden Web ◽

Special Cases ◽

Daunting Task

AbstractDue to the massive size of the hidden web, searching, retrieving and mining rich and high-quality data can be a daunting task. Moreover, with the presence of forms, data cannot be accessed easily. Forms are dynamic, heterogeneous and spread over trillions of web pages. Significant efforts have addressed the problem of tapping into the hidden web to integrate and mine rich data. Effective techniques, as well as application in special cases, are required to be explored to achieve an effective harvest rate. One such special area is atmospheric science, where hidden web crawling is least implemented, and crawler is required to crawl through the huge web to narrow down the search to specific data. In this study, an intelligent hidden web crawler for harvesting data in urban domains (IHWC) is implemented to address the relative problems such as classification of domains, prevention of exhaustive searching, and prioritizing the URLs. The crawler also performs well in curating pollution-related data. The crawler targets the relevant web pages and discards the irrelevant by implementing rejection rules. To achieve more accurate results for a focused crawl, ICHW crawls the websites on priority for a given topic. The crawler has fulfilled the dual objective of developing an effective hidden web crawler that can focus on diverse domains and to check its integration in searching pollution data in smart cities. One of the objectives of smart cities is to reduce pollution. Resultant crawled data can be used for finding the reason for pollution. The crawler can help the user to search the level of pollution in a specific area. The harvest rate of the crawler is compared with pioneer existing work. With an increase in the size of a dataset, the presented crawler can add significant value to emission accuracy. Our results are demonstrating the accuracy and harvest rate of the proposed framework, and it efficiently collect hidden web interfaces from large-scale sites and achieve higher rates than other crawlers.

Download Full-text

Comparative Analysis of Hidden Web Crawlers

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i5.190194 ◽

2018 ◽

Vol 6 (5) ◽

pp. 190-194

Author(s):

Ashok Kumar ◽

◽

Manish Mahajan ◽

...

Keyword(s):

Comparative Analysis ◽

Hidden Web ◽

Web Crawlers

Download Full-text

AN EFFICIENT SMARTCRAWLER FOR HARVESTING WEB INTERFACES OF A TWO-STAGE CRAWLER

i-manager s Journal on Information Technology ◽

10.26634/jit.5.4.10334 ◽

2016 ◽

Vol 5 (4) ◽

pp. 20

Author(s):

SHARMA NIKITHA ◽

DEVI V. SOWMYA ◽

◽

Keyword(s):

Two Stage ◽

Web Interfaces

Download Full-text

Matters of geoportal interfaces designing

Geodesy and Cartography ◽

10.22389/0016-7126-2019-944-2-46-56 ◽

2019 ◽

Vol 944 (2) ◽

pp. 46-56

Author(s):

S.A. Yamashkin ◽

A.A. Yamashkin ◽

O.A. Zarubin

Keyword(s):

Spatial Data ◽

Interface Design ◽

Third Party ◽

Management Systems ◽

Data Management Systems ◽

Web Interfaces ◽

Graphical Interfaces ◽

Spatial Data Management ◽

Software Modules ◽

Cross Platform

The article is devoted to a detailed analysis of the problem of designing graphic geoportal interfaces. The authors formulated the basic points for solving problems in this field, having given the rationale and detailed description of each of them. The emphasis is made on the flexible arrangement of the design and development of interfaces, aiming at the future realities, at the human centricity of the interface design process, at the need for cross-platform adaptive web interfaces, at the preference to use proprietary and third-party software modules over the implementation of spatial data management systems. Lists of basic functional and quality requirements for graphical interfaces of geoportals are given. The geoportal “Natural and cultural heritage of Mordovia” is presented as an illustrative example of the various implementation of graphical user web interfaces. An experimental assessment of the effectiveness of measures to improve geoportal graphical interfaces is given. It is shown that properly over-thought interfaces of geoportal systems can contribute to solving various kinds of problems in many fields.

Download Full-text

The WxChallenge: Forecasting Competition, Educational Tool, and Agent of Cultural Change

Bulletin of the American Meteorological Society ◽

10.1175/bams-d-11-00112.1 ◽

2013 ◽

Vol 94 (10) ◽

pp. 1501-1506 ◽

Cited By ~ 2

Author(s):

Bradley G. Illston ◽

Jeffrey B. Basara ◽

Christopher Weiss ◽

Mike Voss

Keyword(s):

Cultural Change ◽

State Of The Art ◽

Learning Experience ◽

The United States ◽

Higher Learning ◽

Educational Tool ◽

Web Interfaces ◽

Wind Speeds ◽

University Of Oklahoma ◽

The University

The WxChallenge, a project developed at the University of Oklahoma, brings a state-of-the-art, fun, and exciting forecast contest to participants at colleges and universities across North America. The challenge is to forecast the maximum and minimum temperatures, precipitation, and maximum wind speeds for select locations across the United States over a 24-h prediction period. The WxChallenge is open to all undergraduate and graduate students, as well as higher-education faculty, staff, and alumni. Through the use of World Wide Web interfaces accessible by personal computers, tablet computer, and smartphones, the WxChallenge provides a state-of-the-art portal to aid participants in submitting forecasts and alleviate many of the administrative issues (e.g., tracking and scoring) faced by local managers and professors. Since its inception in 2006, 110 universities have participated in the contest and it has been utilized as part of the curricula for 140 classroom courses at various institutions. The inherently challenging nature of the WxChallenge has encouraged its adoption as an educational tool. As its popularity has grown, professors have seen the utility of the Wx-Challenge as a teaching aid and it has become an instructional resource of many meteorological classes at institutions for higher learning. In addition to evidence of educational impacts, the competition has already begun to leave a cultural and social mark on the meteorological learning experience.

Download Full-text

Automatically filling form-based web interfaces with free text inputs

Proceedings of the 18th international conference on World wide web - WWW '09 ◽

10.1145/1526709.1526908 ◽

2009 ◽

Cited By ~ 5

Author(s):

Guilherme A. Toda ◽

Eli Cortez ◽

Filipe Mesquita ◽

Altigran S. da Silva ◽

Edleno Moura ◽

...

Keyword(s):

Free Text ◽

Web Interfaces

Download Full-text

A Semantic Model for Indexing in the Hidden Web

Procedia Computer Science ◽

10.1016/j.procs.2021.06.043 ◽

2021 ◽

Vol 190 ◽

pp. 324-331

Author(s):

Larisa Ismailova ◽

Viacheslav Wolfengagen ◽

Sergey Kosikov

Keyword(s):

Semantic Model ◽

Hidden Web

Download Full-text

The Ecce and Logen partial evaluators and their web interfaces

Proceedings of the 2006 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation - PEPM '06 ◽

10.1145/1111542.1111557 ◽

2006 ◽

Cited By ~ 5

Author(s):

Michael Leuschel ◽

Dan Elphick ◽

Mauricio Varea ◽

Stephen-John Craig ◽

Marc Fontaine

Keyword(s):

Web Interfaces

Download Full-text

A Parallel Unmixing-Based Content Retrieval System for Distributed Hyperspectral Imagery Repository on Cloud Computing Platforms

Remote Sensing ◽

10.3390/rs13020176 ◽

2021 ◽

Vol 13 (2) ◽

pp. 176

Author(s):

Peng Zheng ◽

Zebin Wu ◽

Jin Sun ◽

Yi Zhang ◽

Yaoqin Zhu ◽

...

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Retrieval System ◽

Hyperspectral Image ◽

Parallel Implementation ◽

Remotely Sensed Data ◽

Web Interfaces ◽

Content Retrieval ◽

Service Mode ◽

Computing Platforms

As the volume of remotely sensed data grows significantly, content-based image retrieval (CBIR) becomes increasingly important, especially for cloud computing platforms that facilitate processing and storing big data in a parallel and distributed way. This paper proposes a novel parallel CBIR system for hyperspectral image (HSI) repository on cloud computing platforms under the guide of unmixed spectral information, i.e., endmembers and their associated fractional abundances, to retrieve hyperspectral scenes. However, existing unmixing methods would suffer extremely high computational burden when extracting meta-data from large-scale HSI data. To address this limitation, we implement a distributed and parallel unmixing method that operates on cloud computing platforms in parallel for accelerating the unmixing processing flow. In addition, we implement a global standard distributed HSI repository equipped with a large spectral library in a software-as-a-service mode, providing users with HSI storage, management, and retrieval services through web interfaces. Furthermore, the parallel implementation of unmixing processing is incorporated into the CBIR system to establish the parallel unmixing-based content retrieval system. The performance of our proposed parallel CBIR system was verified in terms of both unmixing efficiency and accuracy.

Download Full-text

Usability Methodologies for Real-Life Voice User Interfaces

International Journal of Information Technology and Web Engineering ◽

10.4018/jitwe.2009100105 ◽

2009 ◽

Vol 4 (4) ◽

pp. 78-94 ◽

Cited By ~ 11

Author(s):

Georgios Kouroupetroglou ◽

Dimitris Spiliotopoulos

Keyword(s):

User Interfaces ◽

Theoretical Perspective ◽

Real Life ◽

Usability Testing ◽

Web Interfaces ◽

Development Framework ◽

Development Lifecycle ◽

Conversational Interfaces ◽

Voice User ◽

Deployed Systems

This paper studies the usability methodologies for spoken dialogue web interfaces along with the appropriate designer-needs analysis. The work unfolds a theoretical perspective to the methods that are extensively used and provides a framework description for creating and testing usable content and applications for conversational interfaces. The main concerns include the design issues for usability testing and evaluation during the development lifecycle, the basic customer experience metrics and the problems that arise after the deployment of real-life systems. Through the discussion of the evaluation and testing methods, this paper argues on the importance and the potential of wizard-based functional assessment and usability testing for deployed systems, presenting an appropriate environment as part of an integrated development framework.

Download Full-text