scholarly journals A Cognitive Adopted Framework for IoT Big-Data Management and Knowledge Discovery Prospective

2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Nilamadhab Mishra ◽  
Chung-Chih Lin ◽  
Hsien-Tsung Chang

In future IoT big-data management and knowledge discovery for large scale industrial automation application, the importance of industrial internet is increasing day by day. Several diversified technologies such as IoT (Internet of Things), computational intelligence, machine type communication, big-data, and sensor technology can be incorporated together to improve the data management and knowledge discovery efficiency of large scale automation applications. So in this work, we need to propose a Cognitive Oriented IoT Big-data Framework (COIB-framework) along with implementation architecture, IoT big-data layering architecture, and data organization and knowledge exploration subsystem for effective data management and knowledge discovery that is well-suited with the large scale industrial automation applications. The discussion and analysis show that the proposed framework and architectures create a reasonable solution in implementing IoT big-data based smart industrial applications.

Author(s):  
Cheng Meng ◽  
Ye Wang ◽  
Xinlian Zhang ◽  
Abhyuday Mandal ◽  
Wenxuan Zhong ◽  
...  

With advances in technologies in the past decade, the amount of data generated and recorded has grown enormously in virtually all fields of industry and science. This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this chapter, we review currently available methods for big data, with a focus on the subsampling methods using statistical leveraging and divide and conquer methods.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 409
Author(s):  
Balázs Bohár ◽  
David Fazekas ◽  
Matthew Madgwick ◽  
Luca Csabai ◽  
Marton Olbei ◽  
...  

In the era of Big Data, data collection underpins biological research more so than ever before. In many cases this can be as time-consuming as the analysis itself, requiring downloading multiple different public databases, with different data structures, and in general, spending days before answering any biological questions. To solve this problem, we introduce an open-source, cloud-based big data platform, called Sherlock (https://earlham-sherlock.github.io/). Sherlock provides a gap-filling way for biologists to store, convert, query, share and generate biology data, while ultimately streamlining bioinformatics data management. The Sherlock platform provides a simple interface to leverage big data technologies, such as Docker and PrestoDB. Sherlock is designed to analyse, process, query and extract the information from extremely complex and large data sets. Furthermore, Sherlock is capable of handling different structured data (interaction, localization, or genomic sequence) from several sources and converting them to a common optimized storage format, for example to the Optimized Row Columnar (ORC). This format facilitates Sherlock’s ability to quickly and easily execute distributed analytical queries on extremely large data files as well as share datasets between teams. The Sherlock platform is freely available on Github, and contains specific loader scripts for structured data sources of genomics, interaction and expression databases. With these loader scripts, users are able to easily and quickly create and work with the specific file formats, such as JavaScript Object Notation (JSON) or ORC. For computational biology and large-scale bioinformatics projects, Sherlock provides an open-source platform empowering data management, data analytics, data integration and collaboration through modern big data technologies.


2021 ◽  
pp. 1-11
Author(s):  
Piyush Pandita ◽  
Panagiotis Tsilifis ◽  
Sayan Ghosh ◽  
Liping Wang

Abstract Gaussian Process (GP) regression or kriging has been extensively applied in the engineering literature for the purposes of building a cheap-to-evaluate surrogate, within the contexts of multi-fidelity modeling, model calibration and design optimization. With the ongoing automation of manufacturing and industrial practices as a part of Industry 4.0, there has been greater need for advancing GP regression techniques to handle challenges such as high input dimensionality, data paucity or big data problems, these consist primarily of proposing efficient design of experiments, optimal data acquisition strategies, and other mathematical tricks. In this work, our attention is focused on the challenges of efficiently training a GP model, which, to the authors opinion, has attracted very little attention and is to-date, poorly addressed. The performance of widely used training approaches such as maximum likelihood estimation and Markov Chain Monte Carlo (MCMC) sampling can deteriorate significantly in high dimensional and big data problems and can lead to cost deficient implementations of critical importance to many industrial applications. Here, we compare an Adaptive Sequential Monte Carlo (ASMC) sampling algorithm to classic MCMC sampling strategies and we demonstrate the effectiveness of our implementation on several mathematical problems and challenging industry applications of varying complexity. The computational time savings of our ASMC approach manifest in large-scale problems helping us to push the boundary of GP regression applicability and scalability in various domain of Industry 4.0, including but not limited to design automation, design engineering, predictive maintenance, and supply chain manufacturing.


2015 ◽  
Vol 6 (1) ◽  
pp. 1-11 ◽  
Author(s):  
M Misbachul Huda ◽  
Dian Rahma Latifa Hayun ◽  
Zhin Martun

Today the rapid growth of the internet and the massive usage of the data have led to the increasing CPU requirement, velocity for recalling data, a schema for more complex data structure management, the reliability and the integrity of the available data. This kind of data is called as Large-scale Data or Big Data. Big Data demands high volume, high velocity, high veracity and high variety. Big Data has to deal with two key issues, the growing size of the datasets and the increasing of data complexity. To overcome these issues, today researches are devoted to kind of database management system that can be optimally used for big data management. There are two kinds of database management system, relational database management system and nonrelational system that can be optimally used for big data management. There are two kinds of database management, Relational Database Management and Non-relational Database Management. This paper will give reviews about these two database management system, including description, vantage, structure and the application of each DBMS. Index Terms - Big Data, DBMS, Large-scale Data, Non-relational Database, Relational Database.


2012 ◽  
Vol 13 (03n04) ◽  
pp. 1250009 ◽  
Author(s):  
CHANGQING JI ◽  
YU LI ◽  
WENMING QIU ◽  
YINGWEI JIN ◽  
YUJIE XU ◽  
...  

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.


2014 ◽  
Vol 7 (5) ◽  
pp. 404-412 ◽  
Author(s):  
Tyler H. McCormick ◽  
Rebecca Ferrell ◽  
Alan F. Karr ◽  
Patrick B. Ryan

Author(s):  
Martin Atzmueller ◽  
Dennis Mollenhauer ◽  
Andreas Schmidt

Large-scale data processing is one of the key challenges concerning many application domains, especially considering ubiquitous and big data. In these contexts, subgroup discovery provides both a flexible data analysis and knowledge discovery method. Subgroup discovery and pattern mining are important descriptive data mining tasks. They can be applied, for example, in order to obtain an overview on the relations in the data, for automatic hypotheses generation, and for a number of knowledge discovery applications. This chapter presents the novel SD-MapR algorithmic framework for large-scale local exceptionality detection implemented using subgroup discovery on the Map/Reduce framework. We describe the basic algorithm in detail and provide an experimental evaluation using several real-world datasets. We tackle two algorithmic variants focusing on simple and more complex target concepts, i.e., presenting an implementation of exceptional model mining on large attributed graphs. The results of our evaluation show the scalability of the presented approach for large data sets.


Sign in / Sign up

Export Citation Format

Share Document