Nakamoto Consensus to Accelerate Supervised Classification Algorithms for Multiparty Computing

Security and Communication Networks ◽

10.1155/2021/6629433 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Zhen Zhang ◽

Bing Guo ◽

Yan Shen ◽

Chengjie Li ◽

Xinhua Suo ◽

...

Keyword(s):

Distributed Computing ◽

Supervised Classification ◽

Large Scale ◽

Heterogeneous Data ◽

Classification Algorithms ◽

Distributed Data ◽

Data Intensive ◽

Private Data ◽

Cooperation Mechanism ◽

Mathematical Formulas

Bitcoin mining consumes tremendous amounts of electricity to solve the hash problem. At the same time, large-scale applications of artificial intelligence (AI) require efficient and secure computing. There are many computing devices in use, and the hardware resources are highly heterogeneous. This means a cooperation mechanism is needed to realize cooperation among computing devices, and a good calculation structure is required in the case of data dispersion. In this paper, we propose an architecture where devices (also called nodes) can reach a consensus on task results using off-chain smart contracts and private data. The proposed distributed computing architecture can accelerate computing-intensive and data-intensive supervised classification algorithms with limited resources. This architecture can significantly increase privacy protection and prevent leakage of distributed data. Our proposed architecture can support heterogeneous data, making computing on each device more efficient. We used mathematical formulas to prove the correctness and robustness of our system and deduced the condition to stop a given task. In the experiments, we transformed Bitcoin hash collision into distributed computing on several nodes and evaluated the training and prediction accuracy for handwritten digit images (MNIST). The experimental results demonstrate the effectiveness of the proposed method.

Download Full-text

Visualization of Large-Scale Distributed Data

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Data Intensive Distributed Computing ◽

10.4018/978-1-61520-971-2.ch011 ◽

2012 ◽

pp. 242-274

Author(s):

Jason Leigh ◽

Andrew Johnson ◽

Luc Renambot ◽

Venkatram Vishwanath ◽

Tom Peterka ◽

...

Keyword(s):

Distributed Computing ◽

Data Visualization ◽

High Speed ◽

Large Scale ◽

Distributed Data ◽

Large Scale Data ◽

Distribution Of Resources ◽

Integrated Facility ◽

Effective Visualization ◽

Scale Data

An effective visualization is best achieved through the creation of a proper representation of data and the interactive manipulation and querying of the visualization. Large-scale data visualization is particularly challenging because the size of the data is several orders of magnitude larger than what can be managed on an average desktop computer. Large-scale data visualization therefore requires the use of distributed computing. By leveraging the widespread expansion of the Internet and other national and international high-speed network infrastructure such as the National LambdaRail, Internet-2, and the Global Lambda Integrated Facility, data and service providers began to migrate toward a model of widespread distribution of resources. This chapter introduces different instantiations of the visualization pipeline and the historic motivation for their creation. The authors examine individual components of the pipeline in detail to understand the technical challenges that must be solved in order to ensure continued scalability. They discuss distributed data management issues that are specifically relevant to large-scale visualization. They also introduce key data rendering techniques and explain through case studies approaches for scaling them by leveraging distributed computing. Lastly they describe advanced display technologies that are now considered the “lenses” for examining large-scale data.

Download Full-text

An algebra for distributed Big Data analytics

Journal of Functional Programming ◽

10.1017/s0956796817000193 ◽

2017 ◽

Vol 27 ◽

Cited By ~ 5

Author(s):

LEONIDAS FEGARAS

Keyword(s):

Large Scale ◽

Big Data Analytics ◽

Distributed Data ◽

Data Intensive ◽

Domain Specific ◽

Small Set ◽

Formal Basis ◽

Nested Queries ◽

Distributed Data Analysis ◽

Query Processing And Optimization

AbstractWe present an algebra for data-intensive scalable computing based on monoid homomorphisms that consists of a small set of operations that capture most features supported by current domain-specific languages for data-centric distributed computing. This algebra is being used as the formal basis of MRQL, which is a query processing and optimization system for large-scale distributed data analysis. The MRQL semantics is given in terms of monoid comprehensions, which support group-by and order-by syntax and can work on heterogeneous collections without requiring any extension to the monoid algebra. We present the syntax and semantics of monoid comprehensions and provide rules to translate them to the monoid algebra. We give evidence of the effectiveness of our algebra by presenting some important optimization rules, such as converting nested queries to joins.

Download Full-text

Distributed data provenance for large-scale data-intensive computing

2013 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2013.6702685 ◽

2013 ◽

Cited By ~ 28

Author(s):

Dongfang Zhao ◽

Chen Shou ◽

Tanu Maliky ◽

Ioan Raicu

Keyword(s):

Large Scale ◽

Data Provenance ◽

Distributed Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Large Scale Data ◽

Scale Data

Download Full-text

Software architecture for large-scale, distributed, data-intensive systems

Proceedings Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004) WICSA-04 ◽

10.1109/wicsa.2004.1310708 ◽

2004 ◽

Cited By ~ 4

Author(s):

C.A. Mattmann ◽

D.J. Crichton ◽

S.J. Hughes ◽

S.C. Kelly ◽

M. Paul

Keyword(s):

Software Architecture ◽

Large Scale ◽

Distributed Data ◽

Data Intensive

Download Full-text

Data and information architectures for large-scale distributed data intensive information systems

Proceedings of 8th International Conference on Scientific and Statistical Data Base Management ◽

10.1109/ssdm.1996.506065 ◽

2002 ◽

Cited By ~ 9

Author(s):

L. Kerschberg ◽

H. Gomaa ◽

D. Menasce ◽

Jong Pil Yoon

Keyword(s):

Information Systems ◽

Large Scale ◽

Distributed Data ◽

Data Intensive

Download Full-text

A performance oriented design methodology for large-scale distributed data intensive information systems

Proceedings of First IEEE International Conference on Engineering of Complex Computer Systems. ICECCS'95 ◽

10.1109/iceccs.1995.479308 ◽

2002 ◽

Cited By ~ 3

Author(s):

D.A. Menasce ◽

H. Gomaa ◽

L. Kerschberg

Keyword(s):

Information Systems ◽

Design Methodology ◽

Large Scale ◽

Distributed Data ◽

Data Intensive ◽

A Performance

Download Full-text

The bounds of the distributed data-intensive computing systems

Pollack Periodica ◽

10.1556/pollack.2.2007.s.8 ◽

2007 ◽

Vol 2 (Supplement 1) ◽

pp. 85-96 ◽

Cited By ~ 1

Author(s):

Antal Buza

Keyword(s):

Distributed Data ◽

Data Intensive Computing ◽

Computing Systems ◽

Data Intensive

Download Full-text

Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning

10.1561/9781680837056 ◽

2020 ◽

Author(s):

Songze Li ◽

Salman Avestimehr

Keyword(s):

Machine Learning ◽

Distributed Computing ◽

Large Scale

Download Full-text

Activity of public control entities and development of distributed computing and distributed data storage systems

Journal of Law and Administration ◽

10.24833/2073-8420-2018-1-46-14-22 ◽

2018 ◽

pp. 14-22

Author(s):

D. V. Gribanov

Keyword(s):

Distributed Computing ◽

Data Storage ◽

Storage Systems ◽

Legal Regulation ◽

Distributed Data ◽

Distributed Data Storage ◽

Public Control ◽

Blockchain Technology ◽

Legal Method ◽

Digital Assets

Introduction. This article is devoted to legal regulation of digital assets turnover, utilization possibilities of distributed computing and distributed data storage systems in activities of public authorities and entities of public control. The author notes that some national and foreign scientists who study a “blockchain” technology (distributed computing and distributed data storage systems) emphasize its usefulness in different activities. Data validation procedure of digital transactions, legal regulation of creation, issuance and turnover of digital assets need further attention.Materials and methods. The research is based on common scientific (analysis, analogy, comparing) and particular methods of cognition of legal phenomena and processes (a method of interpretation of legal rules, a technical legal method, a formal legal method and a formal logical one).Results of the study. The author conducted an analysis which resulted in finding some advantages of the use of the “blockchain” technology in the sphere of public control which are as follows: a particular validation system; data that once were entered in the system of distributed data storage cannot be erased or forged; absolute transparency of succession of actions while exercising governing powers; automatic repeat of recurring actions. The need of fivefold validation of exercising governing powers is substantiated. The author stresses that the fivefold validation shall ensure complex control over exercising of powers by the civil society, the entities of public control and the Russian Federation as a federal state holding sovereignty over its territory. The author has also conducted a brief analysis of judicial decisions concerning digital transactions.Discussion and conclusion. The use of the distributed data storage system makes it easier to exercise control due to the decrease of risks of forge, replacement or termination of data. The author suggests defining digital transaction not only as some actions with digital assets, but also as actions toward modification and addition of information about legal facts with a purpose of its establishment in the systems of distributed data storage. The author suggests using the systems of distributed data storage for independent validation of information about activities of the bodies of state authority. In the author’s opinion, application of the “blockchain” technology may result not only in the increase of efficiency of public control, but also in the creation of a new form of public control – automatic control. It is concluded there is no legislation basis for regulation of legal relations concerning distributed data storage today.

Download Full-text

IoT-enabled directed acyclic graph in spark cluster

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-020-00195-6 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Jahwan Koo ◽

Nawab Muhammad Faseeh Qureshi ◽

Isma Farah Siddiqui ◽

Asad Abbas ◽

Ali Kashif Bashir

Keyword(s):

Distributed Computing ◽

Real Time ◽

Directed Acyclic Graph ◽

Random Access ◽

Heterogeneous Data ◽

The Body ◽

Computing Environment ◽

Time Data ◽

Acyclic Graph ◽

Sensory Data

Abstract Real-time data streaming fetches live sensory segments of the dataset in the heterogeneous distributed computing environment. This process assembles data chunks at a rapid encapsulation rate through a streaming technique that bundles sensor segments into multiple micro-batches and extracts into a repository, respectively. Recently, the acquisition process is enhanced with an additional feature of exchanging IoT devices’ dataset comprised of two components: (i) sensory data and (ii) metadata. The body of sensory data includes record information, and the metadata part consists of logs, heterogeneous events, and routing path tables to transmit micro-batch streams into the repository. Real-time acquisition procedure uses the Directed Acyclic Graph (DAG) to extract live query outcomes from in-place micro-batches through MapReduce stages and returns a result set. However, few bottlenecks affect the performance during the execution process, such as (i) homogeneous micro-batches formation only, (ii) complexity of dataset diversification, (iii) heterogeneous data tuples processing, and (iv) linear DAG workflow only. As a result, it produces huge processing latency and the additional cost of extracting event-enabled IoT datasets. Thus, the Spark cluster that processes Resilient Distributed Dataset (RDD) in a fast-pace using Random access memory (RAM) defies expected robustness in processing IoT streams in the distributed computing environment. This paper presents an IoT-enabled Directed Acyclic Graph (I-DAG) technique that labels micro-batches at the stage of building a stream event and arranges stream elements with event labels. In the next step, heterogeneous stream events are processed through the I-DAG workflow, which has non-linear DAG operation for extracting queries’ results in a Spark cluster. The performance evaluation shows that I-DAG resolves homogeneous IoT-enabled stream event issues and provides an effective stream event heterogeneous solution for IoT-enabled datasets in spark clusters.

Download Full-text