Optimization and Security in Information Retrieval, Extraction, Processing, and Presentation on a Cloud Platform

Adrian Alexandrescu

doi:10.3390/info10060200

Optimization and Security in Information Retrieval, Extraction, Processing, and Presentation on a Cloud Platform

Information ◽

10.3390/info10060200 ◽

2019 ◽

Vol 10 (6) ◽

pp. 200

Author(s):

Adrian Alexandrescu

Keyword(s):

Cost Efficiency ◽

Web Application ◽

Product Information ◽

End User ◽

Case Scenario ◽

New Methods ◽

Vertical Search ◽

Vertical Search Engine ◽

Key Aspects ◽

Processing Steps

This paper presents the processing steps needed in order to have a fully functional vertical search engine. Four actions are identified (i.e., retrieval, extraction, presentation, and delivery) and are required to crawl websites, get the product information from the retrieved webpages, process that data, and offer the end-user the possibility of looking for various products. The whole application flow is focused on low resource usage, and especially on the delivery action, which consists of a web application that uses cloud resources and is optimized for cost efficiency. Novel methods for representing the crawl and extraction template, for product index optimizations, and for deploying and storing data in the cloud database are identified and explained. In addition, key aspects are discussed regarding ethics and security in the proposed solution. A practical use-case scenario is also presented, where products are extracted from seven online board and card game retailers. Finally, the potential of the proposed solution is discussed in terms of researching new methods for improving various aspects of the proposed solution in order to increase cost efficiency and scalability.

Download Full-text

Improvements for research data repositories: The case of text spam

Journal of Information Science ◽

10.1177/0165551521998636 ◽

2021 ◽

pp. 016555152199863

Author(s):

Ismael Vázquez ◽

María Novo-Lourés ◽

Reyes Pavón ◽

Rosalía Laza ◽

José Ramón Méndez ◽

...

Keyword(s):

Web Application ◽

Research Data ◽

Data Sets ◽

Data Repositories ◽

Software Applications ◽

Public Data ◽

Protection Mechanisms ◽

Experimental Protocols ◽

Learning Research ◽

Processing Steps

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Download Full-text

An Improved Link Selection Algorithm for Vertical Search Engine

2009 First International Conference on Information Science and Engineering ◽

10.1109/icise.2009.276 ◽

2009 ◽

Cited By ~ 3

Author(s):

Ling Zheng ◽

Yang Bo ◽

Ning Zhang

Keyword(s):

Search Engine ◽

Selection Algorithm ◽

Vertical Search ◽

Vertical Search Engine

Download Full-text

Analysis and Design of Public Opinion Pre-Warning Analysis Platform Based on Vertical Search Engine

2017 IEEE 14th International Conference on e-Business Engineering (ICEBE) ◽

10.1109/icebe.2017.53 ◽

2017 ◽

Author(s):

Kun Liu ◽

Kun Ma ◽

Zonglin Yue

Keyword(s):

Public Opinion ◽

Search Engine ◽

Analysis And Design ◽

Vertical Search ◽

Vertical Search Engine ◽

Analysis Platform

Download Full-text

TESSA: design and implementation of a platform for situational sea awareness

Natural Hazards and Earth System Science ◽

10.5194/nhess-17-185-2017 ◽

2017 ◽

Vol 17 (2) ◽

pp. 185-196

Author(s):

Mario Scalas ◽

Palmalisa Marra ◽

Luca Tedesco ◽

Raffaele Quarta ◽

Emanuele Cantoro ◽

...

Keyword(s):

Situational Awareness ◽

Building Blocks ◽

Ministry Of Education ◽

End User ◽

Performance Requirements ◽

Platform Architecture ◽

Front End ◽

Critical Issues ◽

And Performance ◽

Key Aspects

Abstract. This article describes the architecture of sea situational awareness (SSA) platform, a major asset within TESSA, an industrial research project funded by the Italian Ministry of Education and Research. The main aim of the platform is to collect, transform and provide forecast and observational data as information suitable for delivery across a variety of channels, like web and mobile; specifically, the ability to produce and provide forecast information suitable for creating SSA-enabled applications has been a critical driving factor when designing and evolving the whole architecture. Thus, starting from functional and performance requirements, the platform architecture is described in terms of its main building blocks and flows among them: front-end components that support end-user applications and map and data analysis components that allow for serving maps and querying data. Focus is directed to key aspects and decisions about the main issues faced, like interoperability, scalability, efficiency and adaptability, but it also considers insights about future works in this and similarly related subjects. Some analysis results are also provided in order to better characterize critical issues and related solutions.

Download Full-text

Classifying queries submitted to a vertical search engine

Proceedings of the 3rd International Web Science Conference on - WebSci '11 ◽

10.1145/2527031.2527055 ◽

2011 ◽

Author(s):

Richard Berendsen ◽

Bogomil Kovachev ◽

Edgar Meij ◽

Maarten de Rijke ◽

Wouter Weerkamp

Keyword(s):

Search Engine ◽

Vertical Search ◽

Vertical Search Engine

Download Full-text

Web Application for Statistical Tracking and Predicting the Evolution of Active Cases with the Novel Coronavirus (SARS-CoV-2)

International Journal of Modeling and Optimization ◽

10.7763/ijmo.2021.v11.780 ◽

2021 ◽

pp. 70-74

Author(s):

Iulia Clitan ◽

◽

Adela Puscasiu ◽

Vlad Muresan ◽

Mihaela Ligia Unguresan ◽

...

Keyword(s):

Neural Network ◽

Mathematical Model ◽

Web Application ◽

Point Of View ◽

The Novel ◽

End User ◽

First Case ◽

Novel Coronavirus ◽

Second Wave ◽

Over Time

Since February 2020, when the first case of infection with SARS COV-2 virus appeared in Romania, the evolution of COVID-19 pandemic continues to have an ascending allure, reaching in September 2020 a second wave of infections as expected. In order to understand the evolution and spread of this disease over time and space, more and more research is focused on obtaining mathematical models that are able to predict the evolution of active cases based on different scenarios and taking into account the numerous inputs that influence the spread of this infection. This paper presents a web responsive application that allows the end user to analyze the evolution of the pandemic in Romania, graphically, and that incorporates, unlike other COVID-19 statistical applications, a prediction of active cases evolution. The prediction is based on a neural network mathematical model, described from the architectural point of view.

Download Full-text

An Approach to Efficient Dictionary Utilization and Improved Data Compression Technique for LZW Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2097.1210220 ◽

2020 ◽

Vol 10 (2) ◽

pp. 224-229

Keyword(s):

Data Compression ◽

Conventional Method ◽

Large Data ◽

Compression Algorithm ◽

Case Scenario ◽

Compression Technique ◽

Data Set ◽

New Methods ◽

Lzw Algorithm ◽

The Way

This paper proposes an improved data compression technique compared to existing Lempel-Ziv-Welch (LZW) algorithm. LZW is a dictionary-updation based compression technique which stores elements from the data in the form of codes and uses them when those strings recur again. When the dictionary gets full, every element in the dictionary are removed in order to update dictionary with new entry. Therefore, the conventional method doesn’t consider frequently used strings and removes all the entry. This method is not an effective compression when the data to be compressed are large and when there are more frequently occurring string. This paper presents two new methods which are an improvement for the existing LZW compression algorithm. In this method, when the dictionary gets full, the elements that haven’t been used earlier are removed rather than removing every element of the dictionary which happens in the existing LZW algorithm. This is achieved by adding a flag to every element of the dictionary. Whenever an element is used the flag is set high. Thus, when the dictionary gets full, the dictionary entries where the flag was set high are kept and others are discarded. In the first method, the entries are discarded abruptly, whereas in the second method the unused elements are removed once at a time. Therefore, the second method gives enough time for the nascent elements of the dictionary. These techniques all fetch similar results when data set is small. This happens due to the fact that difference in the way they handle the dictionary when it’s full. Thus these improvements fetch better results only when a relatively large data is used. When all the three techniques' models were used to compare a data set with yields best case scenario, the compression ratios of conventional LZW is small compared to improved LZW method-1 and which in turn is small compared to improved LZW method-2.

Download Full-text

A Vertical Search Engine Based on Visual and Textual Features

Entertainment for Education. Digital Techniques and Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-14533-9_49 ◽

2010 ◽

pp. 476-485

Author(s):

Kun Wu ◽

Hai Jin ◽

Ran Zheng ◽

Qin Zhang

Keyword(s):

Search Engine ◽

Vertical Search ◽

Vertical Search Engine ◽

Textual Features

Download Full-text

Research of a Vertical Search Engine for Campus Network

Communications in Computer and Information Science - Information Computing and Applications ◽

10.1007/978-3-642-34038-3_6 ◽

2012 ◽

pp. 37-43

Author(s):

Rujia Gao ◽

Wanlong Li ◽

Shanhong Zheng ◽

Hang Li

Keyword(s):

Search Engine ◽

Campus Network ◽

Vertical Search ◽

Vertical Search Engine

Download Full-text

Readable Diagrammatic Query Language ViziQuer

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch062 ◽

2018 ◽

pp. 1389-1408

Author(s):

Martins Zviedris

Keyword(s):

Query Language ◽

User Interaction ◽

Iterative Approach ◽

Data Input ◽

Wide Spread ◽

End User ◽

Data Querying ◽

Custom Made ◽

Key Aspects ◽

User Friendly

End-user interaction with data is one of key aspects in data processing. Nowadays a lot of information systems have a custom made user interface for data input and data querying. From 1970s it is envisioned that a generic, user-friendly approach for data querying could be built, but no wide spread solution has been developed. In the paper we present a diagrammatic query language. We have done an iterative approach to design and improve the diagrammatic query language to make it user readable. Readability is analyzed with questionnaires. Readable diagrammatic query language is the first step to create a more generic and user-friendly data querying.

Download Full-text