A PSO Based Cloud Framework for Knowledge Extraction

Journal of Engineering Research ◽

10.36909/jer.emsme.13897 ◽

2021 ◽

Author(s):

Chaitanya Kanchibhotlaa ◽

◽

Pruthvi Raj Venkatesh ◽

DVLN Somayajulu ◽

Radhakrishna P ◽

...

Keyword(s):

Data Storage ◽

High Performance ◽

Data Extraction ◽

Knowledge Extraction ◽

Data Availability ◽

Maintenance Cost ◽

Record Management ◽

Physical Data ◽

Metadata Enrichment ◽

Geographically Distributed

Many industries, such as oil, construction, banking, and insurance, have substantial historical physical data. Companies store this data in physical warehouses that are geographically distributed and usually taken care of by record management companies. Storing large volumes of historical physical data poses many critical challenges, such as increased maintenance cost, high time for recovery, and unsearchable data. Many companies digitize this data and consolidate this data into cloud repositories as part of their Digital Transformation (DT) journey to address these challenges. This DT process introduces many other technical challenges while dealing with poor scans, huge file size, geographically distributed files, and confidential documents. Though there are options to resolve each of these limitations individually, there are no frameworks that deal with digitization and historical data storage in its entirety. Moreover, they cannot handle a large number of documents having variable file sizes. This paper presents a generic cloud-based high-performance computing framework for knowledge extraction, comprising document classification based on neural networks and particle swarm optimization (PSO), data extraction, metadata enrichment, image enhancement using image processing (IP) techniques, and high data availability to users using cloud-based search. The proposed framework is executed on two cloud providers, i.e., Azure and AWS, to test its efficacy.

Download Full-text

High Performance Distributed Web-Scraper

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(3)-7 ◽

2021 ◽

Vol 33 (3) ◽

pp. 87-100

Author(s):

Denis Eyzenakh ◽

Anton Rameykov ◽

Igor Nikiforov

Keyword(s):

Data Storage ◽

High Performance ◽

Data Extraction ◽

Source Code ◽

The Internet ◽

Distinctive Features ◽

The Past ◽

Web Information ◽

Machine Leaning ◽

The Web

Over the past decade, the Internet has become the gigantic and richest source of data. The data is used for the extraction of knowledge by performing machine leaning analysis. In order to perform data mining of the web-information, the data should be extracted from the source and placed on analytical storage. This is the ETL-process. Different web-sources have different ways to access their data: either API over HTTP protocol or HTML source code parsing. The article is devoted to the approach of high-performance data extraction from sources that do not provide an API to access the data. Distinctive features of the proposed approach are: load balancing, two levels of data storage, and separating the process of downloading files from the process of scraping. The approach is implemented in the solution with the following technologies: Docker, Kubernetes, Scrapy, Python, MongoDB, Redis Cluster, and СephFS. The results of solution testing are described in this article as well.

Download Full-text

CyVerse for Reproducible Research: RNA-Seq Analysis

Plant Bioinformatics - Methods in Molecular Biology ◽

10.1007/978-1-0716-2067-0_3 ◽

2022 ◽

pp. 57-79

Author(s):

Jason Williams

Keyword(s):

Data Storage ◽

High Performance ◽

Lessons Learned ◽

Data Availability ◽

Reproducible Research ◽

Rna Seq ◽

Data Intensive ◽

Interactive Computing ◽

Computing Environments ◽

Performance Computing

AbstractPosing complex research questions poses complex reproducibility challenges. Datasets may need to be managed over long periods of time. Reliable and secure repositories are needed for data storage. Sharing big data requires advance planning and becomes complex when collaborators are spread across institutions and countries. Many complex analyses require the larger compute resources only provided by cloud and high-performance computing infrastructure. Finally at publication, funder and publisher requirements must be met for data availability and accessibility and computational reproducibility. For all of these reasons, cloud-based cyberinfrastructures are an important component for satisfying the needs of data-intensive research. Learning how to incorporate these technologies into your research skill set will allow you to work with data analysis challenges that are often beyond the resources of individual research institutions. One of the advantages of CyVerse is that there are many solutions for high-powered analyses that do not require knowledge of command line (i.e., Linux) computing. In this chapter we will highlight CyVerse capabilities by analyzing RNA-Seq data. The lessons learned will translate to doing RNA-Seq in other computing environments and will focus on how CyVerse infrastructure supports reproducibility goals (e.g., metadata management, containers), team science (e.g., data sharing features), and flexible computing environments (e.g., interactive computing, scaling).

Download Full-text

A Study on Big Data Hadoop Map Reduce Job Scheduling

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.31.18202 ◽

2018 ◽

Vol 7 (3.31) ◽

pp. 59

Author(s):

N Deshai ◽

S Venkataramana ◽

I Hemalatha ◽

G P. S. Varma

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

Job Scheduling ◽

Data Availability ◽

Data Sets ◽

Advantages And Disadvantages ◽

Digital World ◽

Processing Power ◽

Key Factor

A latest tera to zeta era has been created during huge volume of data sets, which keep on collected from different social networks, machine to machine devices, google, yahoo, sensors etc. called as big data. Because day by day double the data storage size, data processing power, data availability and digital world data size in zeta bytes. Apache Hadoop is latest market weapon to handle huge volume of data sets by its most popular components like hdfs and mapreduce, to achieve an efficient storage ability and efficient processing on massive volume of data sets. To design an effective algorithm is a key factor for selecting nodes are important, to optimize and acquire high performance in Big data. An efficient and useful survey, overview, advantages and disadvantages of these scheduling algorithms provided also identified throughout this paper.

Download Full-text

Nonvolatile Ternary Resistive Memory Performance of a Benzothiadiazole-Based Donor–Acceptor Material on ITO-Coated Glass

Coatings ◽

10.3390/coatings11030318 ◽

2021 ◽

Vol 11 (3) ◽

pp. 318

Author(s):

Yang Li ◽

Cheng Zhang ◽

Zhiming Shi ◽

Jingni Li ◽

Qingyun Qian ◽

...

Keyword(s):

Data Storage ◽

High Performance ◽

Memory Performance ◽

Resistive Memory ◽

Molecular Systems ◽

Threshold Voltages ◽

Donor Acceptor ◽

Low Threshold ◽

Multilevel Memory ◽

Further Development

The explosive growth of data and information has increasingly motivated scientific and technological endeavors toward ultra-high-density data storage (UHDDS) applications. Herein, a donor−acceptor (D–A) type small conjugated molecule containing benzothiadiazole (BT) is prepared (NIBTCN), which demonstrates multilevel resistive memory behavior and holds considerable promise for implementing the target of UHDDS. The as-prepared device presents distinct current ratios of 105.2/103.2/1, low threshold voltages of −1.90 V and −3.85 V, and satisfactory reproducibility beyond 60%, which suggests reliable device performance. This work represents a favorable step toward further development of highly-efficient D−A molecular systems, which opens more opportunities for achieving high performance multilevel memory materials and devices.

Download Full-text

Modeling and Parameter Identification for Condition Monitoring of Surface-Mount Permanent Magnet Machines Under Magnet Demagnetization

Volume 1: Adaptive/Intelligent Sys. Control; Driver Assistance/Autonomous Tech.; Control Design Methods; Nonlinear Control; Robotics; Assistive/Rehabilitation Devices; Biomedical/Neural Systems; Building Energy Systems; Connected Vehicle Systems; Control/Estimation of Energy Systems; Control Apps.; Smart Buildings/Microgrids; Education; Human-Robot Systems; Soft Mechatronics/Robotic Components/Systems; Energy/Power Systems; Energy Storage; Estimation/Identification; Vehicle Efficiency/Emissions ◽

10.1115/dscc2020-3186 ◽

2020 ◽

Author(s):

Fanny Pinto Delgado ◽

Ziyou Song ◽

Heath F. Hofmann ◽

Jing Sun

Keyword(s):

Permanent Magnet ◽

Condition Monitoring ◽

High Performance ◽

High Efficiency ◽

Health Condition ◽

Maintenance Cost ◽

Parameter Estimator ◽

Two Phase ◽

Machine Model ◽

Surface Mount

Abstract Permanent Magnet Synchronous Machines (PMSMs) have been preferred for high-performance applications due to their high torque density, high power density, high control accuracy, and high efficiency over a wide operating range. During operation, monitoring the PMSM’s health condition is crucial for detecting any anomalies so that performance degradation, maintenance/downtime costs, and safety hazards can be avoided. In particular, demagnetization of PMSMs can lead to not only degraded performance but also high maintenance cost as they are the most expensive components in a PMSM. In this paper, an equivalent two-phase model for surface-mount permanent magnet (SMPM) machines under permanent magnet demagnetization is formulated and a parameter estimator is proposed for condition monitoring purposes. The performance of the proposed estimator is investigated through analysis and simulation under different conditions, and compared with a parameter estimator based on the standard SMPM machine model. In terms of information that can be extracted for fault diagnosis and condition monitoring, the proposed estimator exhibits advantages over the standard-model-based estimator as it can differentiate between uniform demagnetization over all poles and asymmetric demagnetization between north and south poles.

Download Full-text

High Performance Heterogeneous Data Storage System for High Frequency Sensor Data in a Landslide Laboratory

Advancing Culture of Living with Landslides ◽

10.1007/978-3-319-53498-5_43 ◽

2017 ◽

pp. 371-379

Author(s):

Guntha Ramesh ◽

Hariharan Balaji ◽

T. Hemalatha

Keyword(s):

Data Storage ◽

High Frequency ◽

High Performance ◽

Storage System ◽

Heterogeneous Data ◽

Sensor Data ◽

Frequency Sensor ◽

Data Storage System

Download Full-text

META-pipe cloud setup and execution

F1000Research ◽

10.12688/f1000research.13204.1 ◽

2017 ◽

Vol 6 ◽

pp. 2060

Author(s):

Aleksandr Agafonov ◽

Kimmo Mattila ◽

Cuong Duong Tuan ◽

Lars Tiede ◽

Inge Alexander Raknes ◽

...

Keyword(s):

Functional Annotation ◽

High Performance ◽

Sequence Data ◽

Metagenomic Data ◽

Taxonomic Profiling ◽

Geographically Distributed ◽

Computationally Intensive ◽

High Performance Computing Cluster ◽

And Storage ◽

Performance Computing

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.

Download Full-text

Tip-Based Nanomachining on Thin Films: A Mini Review

Nanomanufacturing and Metrology ◽

10.1007/s41871-021-00115-5 ◽

2021 ◽

Author(s):

Shunyu Chang ◽

Yanquan Geng ◽

Yongda Yan

Keyword(s):

Thin Film ◽

Thin Films ◽

Data Storage ◽

Three Dimensional ◽

Maintenance Cost ◽

Two Dimensional ◽

Force Microscopy ◽

Atomic Force ◽

Current State ◽

Two Dimensional Materials

AbstractAs one of the most widely used nanofabrication methods, the atomic force microscopy (AFM) tip-based nanomachining technique offers important advantages, including nanoscale manipulation accuracy, low maintenance cost, and flexible experimental operation. This technique has been applied to one-, two-, and even three-dimensional nanomachining patterns on thin films made of polymers, metals, and two-dimensional materials. These structures are widely used in the fields of nanooptics, nanoelectronics, data storage, super lubrication, and so forth. Moreover, they are believed to have a wide application in other fields, and their possible industrialization may be realized in the future. In this work, the current state of the research into the use of the AFM tip-based nanomachining method in thin-film machining is presented. First, the state of the structures machined on thin films is reviewed according to the type of thin-film materials (i.e., polymers, metals, and two-dimensional materials). Second, the related applications of tip-based nanomachining to film machining are presented. Finally, the current situation of this area and its potential development direction are discussed. This review is expected to enrich the understanding of the research status of the use of the tip-based nanomachining method in thin-film machining and ultimately broaden its application.

Download Full-text

RDCRMG: A Raster Dataset Clean & Reconstitution Multi-Grid Architecture for Remote Sensing Monitoring of Vegetation Dryness

Remote Sensing ◽

10.3390/rs10091376 ◽

2018 ◽

Vol 10 (9) ◽

pp. 1376 ◽

Cited By ~ 9

Author(s):

Sijing Ye ◽

Diyou Liu ◽

Xiaochuang Yao ◽

Huaizhi Tang ◽

Quan Xiong ◽

...

Keyword(s):

Remote Sensing ◽

Data Storage ◽

Large Scale ◽

High Efficiency ◽

Data Extraction ◽

Automatic Monitoring ◽

Information Distortion ◽

Remote Sensing Monitoring ◽

Data Inversion ◽

Analysis Models

In recent years, remote sensing (RS) research on crop growth status monitoring has gradually turned from static spectrum information retrieval in large-scale to meso-scale or micro-scale, timely multi-source data cooperative analysis; this change has presented higher requirements for RS data acquisition and analysis efficiency. How to implement rapid and stable massive RS data extraction and analysis becomes a serious problem. This paper reports on a Raster Dataset Clean & Reconstitution Multi-Grid (RDCRMG) architecture for remote sensing monitoring of vegetation dryness in which different types of raster datasets have been partitioned, organized and systematically applied. First, raster images have been subdivided into several independent blocks and distributed for storage in different data nodes by using the multi-grid as a consistent partition unit. Second, the “no metadata model” ideology has been referenced so that targets raster data can be speedily extracted by directly calculating the data storage path without retrieving metadata records; third, grids that cover the query range can be easily assessed. This assessment allows the query task to be easily split into several sub-tasks and executed in parallel by grouping these grids. Our RDCRMG-based change detection of the spectral reflectance information test and the data extraction efficiency comparative test shows that the RDCRMG is reliable for vegetation dryness monitoring with a slight reflectance information distortion and consistent percentage histograms. Furthermore, the RDCGMG-based data extraction in parallel circumstances has the advantages of high efficiency and excellent stability compared to that of the RDCGMG-based data extraction in serial circumstances and traditional data extraction. At last, an RDCRMG-based vegetation dryness monitoring platform (VDMP) has been constructed to apply RS data inversion in vegetation dryness monitoring. Through actual applications, the RDCRMG architecture is proven to be appropriate for timely vegetation dryness RS automatic monitoring with better performance, more reliability and higher extensibility. Our future works will focus on integrating more kinds of continuously updated RS data into the RDCRMG-based VDMP and integrating more multi-source datasets based collaborative analysis models for agricultural monitoring.

Download Full-text

High-performance knowledge extraction from data on PC-based networks of workstations

Lecture Notes in Computer Science - Parallel and Distributed Processing ◽

10.1007/bfb0097998 ◽

1999 ◽

pp. 1130-1144 ◽

Cited By ~ 1

Author(s):

Cosimo Anglano ◽

Attilio Giordana ◽

Giuseppe Lo Bello

Keyword(s):

High Performance ◽

Knowledge Extraction

Download Full-text