A Cognitive Adopted Framework for IoT Big-Data Management and Knowledge Discovery Prospective

International Journal of Distributed Sensor Networks ◽

10.1155/2015/718390 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 11

Author(s):

Nilamadhab Mishra ◽

Chung-Chih Lin ◽

Hsien-Tsung Chang

Keyword(s):

Big Data ◽

Data Management ◽

Knowledge Discovery ◽

Large Scale ◽

Industrial Applications ◽

Sensor Technology ◽

Industrial Automation ◽

Data Framework ◽

Industrial Internet ◽

Knowledge Exploration

In future IoT big-data management and knowledge discovery for large scale industrial automation application, the importance of industrial internet is increasing day by day. Several diversified technologies such as IoT (Internet of Things), computational intelligence, machine type communication, big-data, and sensor technology can be incorporated together to improve the data management and knowledge discovery efficiency of large scale automation applications. So in this work, we need to propose a Cognitive Oriented IoT Big-data Framework (COIB-framework) along with implementation architecture, IoT big-data layering architecture, and data organization and knowledge exploration subsystem for effective data management and knowledge discovery that is well-suited with the large scale industrial automation applications. The discussion and analysis show that the proposed framework and architectures create a reasonable solution in implementing IoT big-data based smart industrial applications.

Download Full-text

BDF-SDN: A Big Data Framework for DDoS Attack Detection in Large-Scale SDN-Based Cloud

2021 IEEE Conference on Dependable and Secure Computing (DSC) ◽

10.1109/dsc49826.2021.9346269 ◽

2021 ◽

Author(s):

Phuc Trinh Dinh ◽

Minho Park

Keyword(s):

Big Data ◽

Large Scale ◽

Attack Detection ◽

Ddos Attack ◽

Data Framework ◽

Ddos Attack Detection

Download Full-text

Effective Statistical Methods for Big Data Analytics

Handbook of Research on Applied Cybernetics and Systems Science - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2498-4.ch014 ◽

2017 ◽

pp. 280-299 ◽

Cited By ~ 3

Author(s):

Cheng Meng ◽

Ye Wang ◽

Xinlian Zhang ◽

Abhyuday Mandal ◽

Wenxuan Zhong ◽

...

Keyword(s):

Decision Making ◽

Big Data ◽

Knowledge Discovery ◽

Statistical Methods ◽

Large Scale ◽

Big Data Analytics ◽

Divide And Conquer ◽

Data Driven ◽

The Past ◽

Large Scale Dataset

With advances in technologies in the past decade, the amount of data generated and recorded has grown enormously in virtually all fields of industry and science. This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this chapter, we review currently available methods for big data, with a focus on the subsampling methods using statistical leveraging and divide and conquer methods.

Download Full-text

Big Data Mining or Turning Data Mining into Predictive Analytics from Large-Scale 3Vs Data: The Future Challenge for Knowledge Discovery

Model and Data Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11587-0_2 ◽

2014 ◽

pp. 4-8 ◽

Cited By ~ 8

Author(s):

Alfredo Cuzzocrea

Keyword(s):

Data Mining ◽

Big Data ◽

Knowledge Discovery ◽

Large Scale ◽

Predictive Analytics ◽

Future Challenge ◽

Big Data Mining ◽

The Future

Download Full-text

Sherlock: an open-source data platform to store, analyze and integrate Big Data for biology

F1000Research ◽

10.12688/f1000research.52791.1 ◽

2021 ◽

Vol 10 ◽

pp. 409

Author(s):

Balázs Bohár ◽

David Fazekas ◽

Matthew Madgwick ◽

Luca Csabai ◽

Marton Olbei ◽

...

Keyword(s):

Big Data ◽

Data Management ◽

Open Source ◽

Large Scale ◽

Genomic Sequence ◽

Large Data ◽

Structured Data ◽

Biological Research ◽

Data Platform ◽

Big Data Technologies

In the era of Big Data, data collection underpins biological research more so than ever before. In many cases this can be as time-consuming as the analysis itself, requiring downloading multiple different public databases, with different data structures, and in general, spending days before answering any biological questions. To solve this problem, we introduce an open-source, cloud-based big data platform, called Sherlock (https://earlham-sherlock.github.io/). Sherlock provides a gap-filling way for biologists to store, convert, query, share and generate biology data, while ultimately streamlining bioinformatics data management. The Sherlock platform provides a simple interface to leverage big data technologies, such as Docker and PrestoDB. Sherlock is designed to analyse, process, query and extract the information from extremely complex and large data sets. Furthermore, Sherlock is capable of handling different structured data (interaction, localization, or genomic sequence) from several sources and converting them to a common optimized storage format, for example to the Optimized Row Columnar (ORC). This format facilitates Sherlock’s ability to quickly and easily execute distributed analytical queries on extremely large data files as well as share datasets between teams. The Sherlock platform is freely available on Github, and contains specific loader scripts for structured data sources of genomics, interaction and expression databases. With these loader scripts, users are able to easily and quickly create and work with the specific file formats, such as JavaScript Object Notation (JSON) or ORC. For computational biology and large-scale bioinformatics projects, Sherlock provides an open-source platform empowering data management, data analytics, data integration and collaboration through modern big data technologies.

Download Full-text

Survey of Large-Scale Data Management Systems for Big Data Applications

Journal of Computer Science and Technology ◽

10.1007/s11390-015-1511-8 ◽

2015 ◽

Vol 30 (1) ◽

pp. 163-183 ◽

Cited By ~ 26

Author(s):

Lengdong Wu ◽

Liyan Yuan ◽

Jiahuai You

Keyword(s):

Big Data ◽

Data Management ◽

Large Scale ◽

Management Systems ◽

Data Management Systems ◽

Large Scale Data ◽

Big Data Applications ◽

Scale Data

Download Full-text

Scalable Fully Bayesian Gaussian Process Modeling and Calibration with Adaptive Sequential Monte Carlo for Industrial Applications

Journal of Mechanical Design ◽

10.1115/1.4050246 ◽

2021 ◽

pp. 1-11

Author(s):

Piyush Pandita ◽

Panagiotis Tsilifis ◽

Sayan Ghosh ◽

Liping Wang

Keyword(s):

Monte Carlo ◽

Big Data ◽

Gaussian Process ◽

Industry 4.0 ◽

Large Scale ◽

Sequential Monte Carlo ◽

Industrial Applications ◽

Computational Time ◽

Mcmc Sampling ◽

Gp Model

Abstract Gaussian Process (GP) regression or kriging has been extensively applied in the engineering literature for the purposes of building a cheap-to-evaluate surrogate, within the contexts of multi-fidelity modeling, model calibration and design optimization. With the ongoing automation of manufacturing and industrial practices as a part of Industry 4.0, there has been greater need for advancing GP regression techniques to handle challenges such as high input dimensionality, data paucity or big data problems, these consist primarily of proposing efficient design of experiments, optimal data acquisition strategies, and other mathematical tricks. In this work, our attention is focused on the challenges of efficiently training a GP model, which, to the authors opinion, has attracted very little attention and is to-date, poorly addressed. The performance of widely used training approaches such as maximum likelihood estimation and Markov Chain Monte Carlo (MCMC) sampling can deteriorate significantly in high dimensional and big data problems and can lead to cost deficient implementations of critical importance to many industrial applications. Here, we compare an Adaptive Sequential Monte Carlo (ASMC) sampling algorithm to classic MCMC sampling strategies and we demonstrate the effectiveness of our implementation on several mathematical problems and challenging industry applications of varying complexity. The computational time savings of our ASMC approach manifest in large-scale problems helping us to push the boundary of GP regression applicability and scalability in various domain of Industry 4.0, including but not limited to design automation, design engineering, predictive maintenance, and supply chain manufacturing.

Download Full-text

Data Modeling for Big Data

Jurnal ULTIMA InfoSys ◽

10.31937/si.v6i1.273 ◽

2015 ◽

Vol 6 (1) ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

M Misbachul Huda ◽

Dian Rahma Latifa Hayun ◽

Zhin Martun

Keyword(s):

Big Data ◽

Data Management ◽

Relational Database ◽

Management System ◽

Database Management ◽

Large Scale ◽

Database Management System ◽

Large Scale Data ◽

Relational Database Management ◽

Scale Data

Today the rapid growth of the internet and the massive usage of the data have led to the increasing CPU requirement, velocity for recalling data, a schema for more complex data structure management, the reliability and the integrity of the available data. This kind of data is called as Large-scale Data or Big Data. Big Data demands high volume, high velocity, high veracity and high variety. Big Data has to deal with two key issues, the growing size of the datasets and the increasing of data complexity. To overcome these issues, today researches are devoted to kind of database management system that can be optimally used for big data management. There are two kinds of database management system, relational database management system and nonrelational system that can be optimally used for big data management. There are two kinds of database management, Relational Database Management and Non-relational Database Management. This paper will give reviews about these two database management system, including description, vantage, structure and the application of each DBMS. Index Terms - Big Data, DBMS, Large-scale Data, Non-relational Database, Relational Database.

Download Full-text

BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

Journal of Interconnection Networks ◽

10.1142/s0219265912500090 ◽

2012 ◽

Vol 13 (03n04) ◽

pp. 1250009 ◽

Cited By ~ 14

Author(s):

CHANGQING JI ◽

YU LI ◽

WENMING QIU ◽

YINGWEI JIN ◽

YUJIE XU ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

Data Storage ◽

Large Scale ◽

Distributed Applications ◽

Big Data Processing ◽

Cloud Data ◽

Management Platform ◽

Challenges And Opportunities

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.

Download Full-text

Big data, big results: Knowledge discovery in output from large-scale analytics

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11237 ◽

2014 ◽

Vol 7 (5) ◽

pp. 404-412 ◽

Cited By ~ 9

Author(s):

Tyler H. McCormick ◽

Rebecca Ferrell ◽

Alan F. Karr ◽

Patrick B. Ryan

Keyword(s):

Big Data ◽

Knowledge Discovery ◽

Large Scale

Download Full-text

Big Data Analytics Using Local Exceptionality Detection

Advances in Business Information Systems and Analytics - Enterprise Big Data Engineering, Analytics, and Management ◽

10.4018/978-1-5225-0293-7.ch007 ◽

2016 ◽

pp. 108-125 ◽

Cited By ~ 4

Author(s):

Martin Atzmueller ◽

Dennis Mollenhauer ◽

Andreas Schmidt

Keyword(s):

Big Data ◽

Knowledge Discovery ◽

Large Scale ◽

Pattern Mining ◽

Subgroup Discovery ◽

Data Sets ◽

Basic Algorithm ◽

Real World Datasets ◽

Large Scale Data Processing ◽

Complex Target

Large-scale data processing is one of the key challenges concerning many application domains, especially considering ubiquitous and big data. In these contexts, subgroup discovery provides both a flexible data analysis and knowledge discovery method. Subgroup discovery and pattern mining are important descriptive data mining tasks. They can be applied, for example, in order to obtain an overview on the relations in the data, for automatic hypotheses generation, and for a number of knowledge discovery applications. This chapter presents the novel SD-MapR algorithmic framework for large-scale local exceptionality detection implemented using subgroup discovery on the Map/Reduce framework. We describe the basic algorithm in detail and provide an experimental evaluation using several real-world datasets. We tackle two algorithmic variants focusing on simple and more complex target concepts, i.e., presenting an implementation of exceptional model mining on large attributed graphs. The results of our evaluation show the scalability of the presented approach for large data sets.

Download Full-text