scholarly journals Optimization and Security in Information Retrieval, Extraction, Processing, and Presentation on a Cloud Platform

Information ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 200
Author(s):  
Adrian Alexandrescu

This paper presents the processing steps needed in order to have a fully functional vertical search engine. Four actions are identified (i.e., retrieval, extraction, presentation, and delivery) and are required to crawl websites, get the product information from the retrieved webpages, process that data, and offer the end-user the possibility of looking for various products. The whole application flow is focused on low resource usage, and especially on the delivery action, which consists of a web application that uses cloud resources and is optimized for cost efficiency. Novel methods for representing the crawl and extraction template, for product index optimizations, and for deploying and storing data in the cloud database are identified and explained. In addition, key aspects are discussed regarding ethics and security in the proposed solution. A practical use-case scenario is also presented, where products are extracted from seven online board and card game retailers. Finally, the potential of the proposed solution is discussed in terms of researching new methods for improving various aspects of the proposed solution in order to increase cost efficiency and scalability.

2021 ◽  
pp. 016555152199863
Author(s):  
Ismael Vázquez ◽  
María Novo-Lourés ◽  
Reyes Pavón ◽  
Rosalía Laza ◽  
José Ramón Méndez ◽  
...  

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.


2017 ◽  
Vol 17 (2) ◽  
pp. 185-196
Author(s):  
Mario Scalas ◽  
Palmalisa Marra ◽  
Luca Tedesco ◽  
Raffaele Quarta ◽  
Emanuele Cantoro ◽  
...  

Abstract. This article describes the architecture of sea situational awareness (SSA) platform, a major asset within TESSA, an industrial research project funded by the Italian Ministry of Education and Research. The main aim of the platform is to collect, transform and provide forecast and observational data as information suitable for delivery across a variety of channels, like web and mobile; specifically, the ability to produce and provide forecast information suitable for creating SSA-enabled applications has been a critical driving factor when designing and evolving the whole architecture. Thus, starting from functional and performance requirements, the platform architecture is described in terms of its main building blocks and flows among them: front-end components that support end-user applications and map and data analysis components that allow for serving maps and querying data. Focus is directed to key aspects and decisions about the main issues faced, like interoperability, scalability, efficiency and adaptability, but it also considers insights about future works in this and similarly related subjects. Some analysis results are also provided in order to better characterize critical issues and related solutions.


Author(s):  
Richard Berendsen ◽  
Bogomil Kovachev ◽  
Edgar Meij ◽  
Maarten de Rijke ◽  
Wouter Weerkamp

Author(s):  
Iulia Clitan ◽  
◽  
Adela Puscasiu ◽  
Vlad Muresan ◽  
Mihaela Ligia Unguresan ◽  
...  

Since February 2020, when the first case of infection with SARS COV-2 virus appeared in Romania, the evolution of COVID-19 pandemic continues to have an ascending allure, reaching in September 2020 a second wave of infections as expected. In order to understand the evolution and spread of this disease over time and space, more and more research is focused on obtaining mathematical models that are able to predict the evolution of active cases based on different scenarios and taking into account the numerous inputs that influence the spread of this infection. This paper presents a web responsive application that allows the end user to analyze the evolution of the pandemic in Romania, graphically, and that incorporates, unlike other COVID-19 statistical applications, a prediction of active cases evolution. The prediction is based on a neural network mathematical model, described from the architectural point of view.


This paper proposes an improved data compression technique compared to existing Lempel-Ziv-Welch (LZW) algorithm. LZW is a dictionary-updation based compression technique which stores elements from the data in the form of codes and uses them when those strings recur again. When the dictionary gets full, every element in the dictionary are removed in order to update dictionary with new entry. Therefore, the conventional method doesn’t consider frequently used strings and removes all the entry. This method is not an effective compression when the data to be compressed are large and when there are more frequently occurring string. This paper presents two new methods which are an improvement for the existing LZW compression algorithm. In this method, when the dictionary gets full, the elements that haven’t been used earlier are removed rather than removing every element of the dictionary which happens in the existing LZW algorithm. This is achieved by adding a flag to every element of the dictionary. Whenever an element is used the flag is set high. Thus, when the dictionary gets full, the dictionary entries where the flag was set high are kept and others are discarded. In the first method, the entries are discarded abruptly, whereas in the second method the unused elements are removed once at a time. Therefore, the second method gives enough time for the nascent elements of the dictionary. These techniques all fetch similar results when data set is small. This happens due to the fact that difference in the way they handle the dictionary when it’s full. Thus these improvements fetch better results only when a relatively large data is used. When all the three techniques' models were used to compare a data set with yields best case scenario, the compression ratios of conventional LZW is small compared to improved LZW method-1 and which in turn is small compared to improved LZW method-2.


Author(s):  
Martins Zviedris

End-user interaction with data is one of key aspects in data processing. Nowadays a lot of information systems have a custom made user interface for data input and data querying. From 1970s it is envisioned that a generic, user-friendly approach for data querying could be built, but no wide spread solution has been developed. In the paper we present a diagrammatic query language. We have done an iterative approach to design and improve the diagrammatic query language to make it user readable. Readability is analyzed with questionnaires. Readable diagrammatic query language is the first step to create a more generic and user-friendly data querying.


Sign in / Sign up

Export Citation Format

Share Document