Implementation of Web Application for Disease Prediction Using AI

Mapping Intimacies ◽

10.54646/bijdmbd.002 ◽

2020 ◽

pp. 5-9

Author(s):

Manasvi Srivastava ◽

◽

Vikas Yadav ◽

Swati Singh ◽

◽

...

Keyword(s):

Web Application ◽

Ad Hoc ◽

Data Extraction ◽

Extraction Methods ◽

Web Page ◽

Web Based ◽

Web Extraction ◽

Web Scraping ◽

Audio Video ◽

Manual Extraction

The Internet is the largest source of information created by humanity. It contains a variety of materials available in various formats such as text, audio, video and much more. In all web scraping is one way. It is a set of strategies here in which we get information from the website instead of copying the data manually. Many Web-based data extraction methods are designed to solve specific problems and work on ad-hoc domains. Various tools and technologies have been developed to facilitate Web Scraping. Unfortunately, the appropriateness and ethics of using these Web Scraping tools are often overlooked. There are hundreds of web scraping software available today, most of them designed for Java, Python and Ruby. There is also open source software and commercial software. Web-based software such as YahooPipes, Google Web Scrapers and Firefox extensions for Outwit are the best tools for beginners in web cutting. Web extraction is basically used to cut this manual extraction and editing process and provide an easy and better way to collect data from a web page and convert it into the desired format and save it to a local or archive directory. In this paper, among others the kind of scrub, we focus on those techniques that extract the content of a Web page. In particular, we use scrubbing techniques for a variety of diseases with their own symptoms and precautions.

Download Full-text

A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis

International Journal of Web Information Systems ◽

10.1108/ijwis-03-2021-0037 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Irvin Dongo ◽

Yudith Cardinale ◽

Ana Aguilera ◽

Fabiola Martinez ◽

Yuni Quintero ◽

...

Keyword(s):

San Francisco ◽

Data Extraction ◽

Qualitative Evaluation ◽

Application Programming Interface ◽

Extraction Methods ◽

Content Type ◽

Qualitative And Quantitative ◽

Advantages And Disadvantages ◽

Web Scraping ◽

Shared Information

Purpose This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations. Design/methodology/approach As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods. Findings The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web. Originality/value Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.

Download Full-text

Improving Performance of DOM in Semi-structured Data Extraction using WEIDJ Model

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v9.i3.pp752-763 ◽

2018 ◽

Vol 9 (3) ◽

pp. 752 ◽

Cited By ~ 2

Author(s):

Ily Amalina Ahmad Sabri ◽

Mustafa Man

Keyword(s):

Data Extraction ◽

Extraction Process ◽

Structured Data ◽

Web Pages ◽

Web Page ◽

Web Data ◽

Web Documents ◽

Web Extraction ◽

Comparison Time ◽

The Web

<p>Web data extraction is the process of extracting user required information from web page. The information consists of semi-structured data not in structured format. The extraction data involves the web documents in html format. Nowadays, most people uses web data extractors because the extraction involve large information which makes the process of manual information extraction takes time and complicated. We present in this paper WEIDJ approach to extract images from the web, whose goal is to harvest images as object from template-based html pages. The WEIDJ (Web Extraction Image using DOM (Document Object Model) and JSON (JavaScript Object Notation)) applies DOM theory in order to build the structure and JSON as environment of programming. The extraction process leverages both the input of web address and the structure of extraction. Then, WEIDJ splits DOM tree into small subtrees and applies searching algorithm by visual blocks for each web page to find images. Our approach focus on three level of extraction; single web page, multiple web page and the whole web page. Extensive experiments on several biodiversity web pages has been done to show the comparison time performance between image extraction using DOM, JSON and WEIDJ for single web page. The experimental results advocate via our model, WEIDJ image extraction can be done fast and effectively.</p>

Download Full-text

Markup Languages and Electronic Commerce

Text Databases and Document Management ◽

10.4018/978-1-878289-93-3.ch001 ◽

2011 ◽

pp. 1-21 ◽

Cited By ~ 1

Author(s):

Ingrid Fisher

Keyword(s):

Electronic Commerce ◽

Web Application ◽

Web Page ◽

Application Development ◽

Markup Languages ◽

Web Based ◽

Web Application Development ◽

The Relationship

In this chapter, we provide a brief description of the development of markup languages, describe their role in the context of web application development and deployment, and finally provide an example of an XML application for a simple Web page displaying a typical sales invoice. Our objective in this example is to illustrate the relationship between the relational and XML representation of data that is required for the use of XML in the development of web-based applications in electronic commerce.

Download Full-text

Multi-Purpose Dataset of Webpages and Its Content Blocks: Design and Structure Validation

Applied Sciences ◽

10.3390/app11083319 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3319

Author(s):

Kiril Griazev ◽

Simona Ramanauskaitė

Keyword(s):

Data Extraction ◽

Extraction Methods ◽

Structure Validation ◽

Web Page ◽

Specific Data ◽

Experience Levels ◽

Worldwide Web ◽

Data Points ◽

Web Developers ◽

Constant Addition

The need for automated data extraction is continuously growing due to the constant addition of information to the worldwide web. Researchers are developing new data extraction methods to achieve increased performance compared to existing methods. Comparing algorithms to evaluate their performance is vital when developing new solutions. Different algorithms require different datasets to test their performance due to the various data extraction approaches. Currently, most datasets tend to focus on a specific data extraction approach. Thus, they generally lack the data that may be useful for other extraction methods. That leads to difficulties when comparing the performance of algorithms that are vastly different in their approach. We propose a dataset of web page content blocks that includes various data points to counter this. We also validate its design and structure by performing block labeling experiments. Web developers of varying experience levels labeled multiple websites presented to them. Their labeling results were stored in the newly proposed dataset structure. The experiment proved the need for proposed data points and validated dataset structure suitability for multi-purpose dataset design.

Download Full-text

Data Extraction and Scratching Information Using R

Shanlax International Journal of Arts, Science and Humanities ◽

10.34293/sijash.v8i3.3588 ◽

2021 ◽

Vol 8 (3) ◽

pp. 140-144

Author(s):

G Midhu Bala ◽

K Chitra

Keyword(s):

Ad Hoc ◽

Text Processing ◽

Data Extraction ◽

Human Computer Interactions ◽

Common Goal ◽

Web Scraping ◽

Structured Information ◽

Human Effort ◽

R Programming ◽

Unstructured Information

Web scraping is the process of automatically extracting multiple WebPages from the World Wide Web. It is a field with active developments that shares a common goal with text processing, the semantic web vision, semantic understanding, machine learning, artificial intelligence and human- computer interactions. Current web scraping solutions range from requiring human effort, the ad-hoc, and to fully automated systems that are able to extract the required unstructured information, convert into structured information, with limitations. This paper describes a method for developing a web scraper using R programming that locates files on a website and then extracts the filtered data and stores it. The modules used and the algorithm of automating the navigation of a website via links are mentioned in this paper. Further it can be used for data analytics.

Download Full-text

Planning for Effectiveness Web-Based Commerce Application Development

Managing Web-Enabled Technologies in Organizations ◽

10.4018/978-1-878289-72-8.ch003 ◽

2011 ◽

pp. 36-53

Author(s):

Ming-te Lu ◽

W. L. Yeung

Keyword(s):

Web Sites ◽

Web Application ◽

Development Projects ◽

Web Page ◽

Application Development ◽

Web Based ◽

Commercial Activities ◽

Computer Interfaces ◽

Human Computer Interfaces ◽

Web Application Development

An ever-increasing number of businesses have established Web sites to engage in commercial activities today, forming the so-called Web-based commerce. However, careful planning and preparation are needed for those businesses to achieve their intended purposes with this new channel of distribution. This chapter proposes a framework for planning effective Web-based commerce application development based on prior research in hypermedia and human-computer interfaces, and recent research on Web-based commerce. The framework regards Web application development as a type of software development projects. At the onset, the project’s social acceptability is investigated. Next, system feasibility is carried out. If the proposed project is viable, its Web-page interface is examined both from the functionality, contents, and navigability points of view. The use of the framework will contribute to more effective Web-based commerce application development.

Download Full-text

NGLMS-CR: A Web-Based Clerical Review Tool

International Journal for Population Data Science ◽

10.23889/ijpds.v5i5.1627 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

James Farrow ◽

Miro Palfy

Keyword(s):

Web Application ◽

Ad Hoc ◽

Multiple Birth ◽

Graph Database ◽

Sensitive Data ◽

Web Based ◽

Time Activity ◽

Gender Information ◽

Other Information ◽

Birth Records

Native applications often require special permissions to install and update which may not always be available in secure/controlled sensitive-data environments. Deploying new features can be time-consuming and introduce delays. The Next Generation Linkage System Clerical Review tool (NGLMS-CR) is a configurable web application which side-steps these issues. The NGLMS-CR allows rapid response to changing project requirements. Objectives and ApproachWe wanted a lightweight platform-independent tool to run in any modern web-browser. This avoids a need for special administrative privileges and allows rapid deployment/update. The NGLMS-CR communicates via a simple RESTful protocol to a server managing user-permissions, workflows and review data. Data is stored in a relational or graph database. The NGLMS-CR does not require a graph database. Users see various workpools (collections of records) and work through these at their own pace. Different workpools may be configured for different requirements, e.g. clusters requiring special expertise, clusters requiring higher data security, overly-large clusters &c. Custom workflows are built around workpools. Individual clusters can have customisable status indicators displayed to highlight integrity checks and other information, e.g. a cluster containing multiple birth records or containing records with inconsistent gender information. The tool also promotes ergonomic and health objectives as well as collecting metrics about review activity. The review session tracks reviewer engagement and gives feedback on the situation so far: how long has been spent reviewing; how long activity has been undertaken without a break. The user is given visual warnings when sessions extend beyond a set time. Activity is logged and metrics regarding throughput/accuracy collated for analysis.ResultsIn real-time manual clerical review tasks the customisable nature of the NGLMS-CR has proven important. Changes are immediately visible to users and workflow and status icons changes available with no delay while software is ‘rolled out’. Conclusion / ImplicationsUncoupling review from any particular linkage system enables flexibility in project administration. A web-based review application may run on any system and requires minimal administrative permissions. This facilitates deployment in sensitive environments. Customisable workflow allows quick creation of ad hoc projects/tasks, even mid-project as new situations are discovered. Customisable cluster integrity checks allows cluster- and project-sensitive feedback to be rapidly deployed to aid review. By being flexible and independent the NGLMS-CR can supplement and complement existing linkage and review processes.

Download Full-text

Web Scraping in Finance using Python

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9437.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 255-262

Keyword(s):

Web Application ◽

Human Error ◽

Legal Aspect ◽

Balance Sheet ◽

Financial Data ◽

Online Data ◽

Finance Industry ◽

Web Scraping ◽

Income Statement ◽

Manual Extraction

: The objective of this paper is to highlight different ways to extract financial data ( Balance Sheet, Income Statement and Cash Flow) of different companies from Yahoo finance and present an elaborate model to provide an economical, reliable and, a time-efficient tool for this purpose. It aims at aiding business analysts who are not well versed with coding but need quantitative outputs to analyse, predict, and make market decisions, by automating the process of generation of financial data. A python model, which scrapes the required data from Yahoo finance and presents it in an articulate manner in the form of an Excel sheet is implemented, along with a web application build using python with a minimalistic and simple user interface to facilitate this process. This proposed method not only removes any chances of human error caused due to manual extraction of data but also improves the overall productivity of analysts by drastically reducing the time it takes to generate the data and thus saves a substantial amount of human hours for the consumer. We also discuss different methods of scraping online data, the legal aspect of web scraping and the importance of data mining and scraping technologies in the finance industry which is highly dependent on data to analyse and make decisions.

Download Full-text

Web Scraping in Finance using Python

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9457.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 255-262

Author(s):

Siddhant Vinayak Chanda ◽

◽

Arivoli A ◽

Keyword(s):

Web Application ◽

Human Error ◽

Legal Aspect ◽

Balance Sheet ◽

Financial Data ◽

Online Data ◽

Finance Industry ◽

Web Scraping ◽

Income Statement ◽

Manual Extraction

The objective of this paper is to highlight different ways to extract financial data ( Balance Sheet, Income Statement and Cash Flow) of different companies from Yahoo finance and present an elaborate model to provide an economical, reliable and, a time-efficient tool for this purpose. It aims at aiding business analysts who are not well versed with coding but need quantitative outputs to analyse, predict, and make market decisions, by automating the process of generation of financial data. A python model is used, which scrapes the required data from Yahoo finance and presents it in a precise and concise manner in the form of an Excel sheet. A web application is build using python with a minimalistic and simple User Interface to facilitate this process. This proposed method not only removes any chances of human error caused due to manual extraction of data but also improves the overall productivity of analysts by drastically reducing the time it takes to generate the data and thus saves a substantial amount of human hours for the consumer. We also discuss the importance of data mining and scraping technologies in the finance industry, different methods of scraping online data, and the legal aspect of web scraping which is highly dependent on generated data to analyse and make decisions.

Download Full-text

A Software Model, Architecture and Environment to Support Web-Based Applications

Architectural Issues of Web-Enabled Electronic Business ◽

10.4018/978-1-59140-049-3.ch017 ◽

2011 ◽

pp. 254-269

Author(s):

David Kearney ◽

Weiquan Zhao

Keyword(s):

Electronic Commerce ◽

Web Application ◽

Ad Hoc ◽

Application Software ◽

Web Based ◽

Document Delivery ◽

Software Applications ◽

Software Upgrades ◽

Web Infrastructure ◽

The Web

Designed originally for document delivery, the Web is now being widely used as a platform for electronic commerce application software. The ad hoc enhancements that have made Web application software possible (for example, CGI and Java Script) have created an application support infrastructure where application software upgrades and maintenance are very complex. Yet the Web is the preferred platform for applications that have continuous ongoing development needs. In this chapter, we describe a model, an architecture, and an associated Web Application Support Environment (WASE) that both hides the low-level complexity of the existing Web infrastructure and, at the same time, empowers enterprise Web application programmers in their objective of writing modular and easily maintainable software applications for electronic commerce. WASE is not a compiler and does not completely abstract away the unique features of Web infrastructure. It is being constructed using XML documents in its API, to allow the function and configurability of applications to be defined in a Web-like fashion.

Download Full-text