Systemizing Interprocedural Static Analysis of Large-scale Systems Code with Graspan

Zhiqiang Zuo; Kai Wang; Aftab Hussain; Ardalan Amiri Sani; Yiyu Zhang; Shenming Lu; Wensheng Dou; Linzhang Wang; Xuandong Li; Chenxi Wang; Guoqing Harry Xu

doi:10.1145/3466820

Systemizing Interprocedural Static Analysis of Large-scale Systems Code with Graspan

ACM Transactions on Computer Systems ◽

10.1145/3466820 ◽

2021 ◽

Vol 38 (1-2) ◽

pp. 1-39

Author(s):

Zhiqiang Zuo ◽

Kai Wang ◽

Aftab Hussain ◽

Ardalan Amiri Sani ◽

Yiyu Zhang ◽

...

Keyword(s):

Big Data ◽

Static Analysis ◽

Large Scale ◽

Big Data Analytics ◽

Large Scale Systems ◽

Context Sensitive ◽

Large Program ◽

History Of ◽

Graph System ◽

Processing Techniques

There is more than a decade-long history of using static analysis to find bugs in systems such as Linux. Most of the existing static analyses developed for these systems are simple checkers that find bugs based on pattern matching. Despite the presence of many sophisticated interprocedural analyses, few of them have been employed to improve checkers for systems code due to their complex implementations and poor scalability. In this article, we revisit the scalability problem of interprocedural static analysis from a “Big Data” perspective. That is, we turn sophisticated code analysis into Big Data analytics and leverage novel data processing techniques to solve this traditional programming language problem. We propose Graspan , a disk-based parallel graph system that uses an edge-pair centric computation model to compute dynamic transitive closures on very large program graphs. We develop two backends for Graspan, namely, Graspan-C running on CPUs and Graspan-G on GPUs, and present their designs in the article. Graspan-C can analyze large-scale systems code on any commodity PC, while, if GPUs are available, Graspan-G can be readily used to achieve orders of magnitude speedup by harnessing a GPU’s massive parallelism. We have implemented fully context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases written in multiple languages such as Linux and Apache Hadoop demonstrates that their Graspan implementations are language-independent, scale to millions of lines of code, and are much simpler than their original implementations. Moreover, we show that these analyses can be used to uncover many real-world bugs in large-scale systems code.

Download Full-text

Interoperability in Internet of Media Things and Integration Big Media

10.4018/978-1-7998-4186-9.ch004 ◽

2022 ◽

pp. 59-79

Author(s):

Dragorad A. Milovanovic ◽

Vladan Pantovic

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

Reference Model ◽

Big Data Analytics ◽

Multimedia Data ◽

Data Sets ◽

Large Scale Systems ◽

New Class ◽

Media Applications

Multimedia-related things is a new class of connected objects that can be searched, discovered, and composited on the internet of media things (IoMT). A huge amount of data sets come from audio-visual sources or have a multimedia nature. However, multimedia data is currently not incorporated in the big data (BD) frameworks. The research projects, standardization initiatives, and industrial activities for integration are outlined in this chapter. MPEG IoMT interoperability and network-based media processing (NBMP) framework as an instance of the big media (BM) reference model are explored. Conceptual model of IoT and big data integration for analytics is proposed. Big data analytics is rapidly evolving both in terms of functionality and the underlying model. The authors pointed out that IoMT analytics is closely related to big data analytics, which facilitates the integration of multimedia objects in big media applications in large-scale systems. These two technologies are mutually dependent and should be researched and developed jointly.

Download Full-text

Distributed optimization over large-scale systems for big data analytics

4OR ◽

10.1007/s10288-020-00446-x ◽

2020 ◽

Author(s):

Reza Shahbazian

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

Distributed Optimization ◽

Big Data Analytics ◽

Large Scale Systems

Download Full-text

Big Data Analytics in Social Media

Advances in Business Information Systems and Analytics - Machine Learning Techniques for Improved Business Analytics ◽

10.4018/978-1-5225-3534-8.ch006 ◽

2019 ◽

pp. 107-124

Author(s):

Janani Balakumar ◽

Vijayarani Mohan

Keyword(s):

Social Media ◽

Big Data ◽

Data Analytics ◽

Large Scale ◽

Rapid Development ◽

Big Data Analytics ◽

Information Control ◽

Social Media Platforms ◽

Current Scenario ◽

Processing Techniques

The rapid development of online social media is the method of collaboratively produced content material presents new possibilities and challenges to both producers and patrons of knowledge. The term big data refers to large-scale information control and evaluation technologies that exceed the functionality of conventional data processing techniques. In the current scenario, social media has gained amazing attention within the last decade. Accessing social media platforms and websites such as Facebook, Twitter, YouTube, LinkedIn, Instagram, and Google+, web technologies have become more responsible. People are becoming more fascinated about and relying on social media platform for records, news, and opinion of other customers on diverse topics. Hence, these situations produce a large volume of data. The main objective of this chapter is to provide knowledge about big data analytics in social media. A brief overview of big data and social media are discussed. Research challenges in social media are also discussed.

Download Full-text

Data Lake Ecosystem Workflow

10.21079/11681/40203 ◽

2021 ◽

Author(s):

R. Salter ◽

Quyen Dong ◽

Cody Coleman ◽

Maria Seale ◽

Alicia Ruvinsky ◽

...

Keyword(s):

Big Data ◽

Language Processing ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Lake Ecosystem ◽

Data Governance ◽

Government Organizations ◽

Large Scale Data ◽

Scale Data

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.

Download Full-text

Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2019.0100469 ◽

2019 ◽

Vol 10 (4) ◽

Author(s):

Ayaz H Khan ◽

Ali Mustafa ◽

Aneeq Yusuf ◽

Rehanullah Khan

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Learning Models

Download Full-text

Big Data and Cloud Computing

Exploring the Convergence of Big Data and the Internet of Things - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2947-7.ch001 ◽

2018 ◽

pp. 1-28 ◽

Cited By ~ 2

Author(s):

Marcus Tanque ◽

Harry J Foxwell

Keyword(s):

Cloud Computing ◽

Big Data ◽

Supply Chain ◽

Supply Chain Management ◽

Data Analytics ◽

Large Scale ◽

Big Data Analytics ◽

Chain Management ◽

Computing Services ◽

Cloud Computing Services

Big data and cloud computing are transforming information technology. These comparable technologies are the result of dramatic developments in computational power, virtualization, network bandwidth, availability, storage capability, and cyber-physical systems. The crossroads of these two areas, involves the use of cloud computing services and infrastructure, to support large-scale data analytics research, providing relevant solutions or future possibilities for supply chain management. This chapter broadens the current posture of cloud computing and big data, as associate with the supply chain solutions. This chapter focuses on areas of significant technology and scientific advancements, which are likely to enhance supply chain systems. This evaluation emphasizes the security challenges and mega-trends affecting cloud computing and big data analytics pertaining to supply chain management.

Download Full-text

Big Data Analytics

Advances in Human and Social Aspects of Technology - HCI Challenges and Privacy Preservation in Big Data Security ◽

10.4018/978-1-5225-2863-0.ch006 ◽

2018 ◽

pp. 124-138

Author(s):

Sreenu G. ◽

M.A. Saleem Durai

Keyword(s):

Decision Making ◽

Big Data ◽

Real Life ◽

Big Data Analytics ◽

Speed Up ◽

Processing Techniques ◽

And Storage ◽

Big Data Computing ◽

New Algorithms ◽

Over Time

Advances in recent hardware technology have permitted to document transactions and other pieces of information of everyday life at an express pace. In addition of speed up and storage capacity, real-life perceptions tend to transform over time. However, there are so much prospective and highly functional values unseen in the vast volume of data. For this kind of applications conventional data mining is not suitable, so they should be tuned and changed or designed with new algorithms. Big data computing is inflowing to the category of most hopeful technologies that shows the way to new ways of thinking and decision making. This epoch of big data helps users to take benefit out of all available data to gain more precise systematic results or determine latent information, and then make best possible decisions. Depiction from a broad set of workloads, the author establishes a set of classifying measures based on the storage architecture, processing types, processing techniques and the tools and technologies used.

Download Full-text

Synchronizing Execution of Big Data in Distributed and Parallelized Environments

Big Data ◽

10.4018/978-1-4666-9840-6.ch071 ◽

2016 ◽

pp. 1555-1581

Author(s):

Gueyoung Jung ◽

Tridib Mukherjee

Keyword(s):

Big Data ◽

Distributed System ◽

Data Analytics ◽

High Performance ◽

Large Scale ◽

Big Data Analytics ◽

Loosely Coupled ◽

Current Trends ◽

Distributed Computing Infrastructures ◽

Performance Computing

In the modern information era, the amount of data has exploded. Current trends further indicate exponential growth of data in the future. This prevalent humungous amount of data—referred to as big data—has given rise to the problem of finding the “needle in the haystack” (i.e., extracting meaningful information from big data). Many researchers and practitioners are focusing on big data analytics to address the problem. One of the major issues in this regard is the computation requirement of big data analytics. In recent years, the proliferation of many loosely coupled distributed computing infrastructures (e.g., modern public, private, and hybrid clouds, high performance computing clusters, and grids) have enabled high computing capability to be offered for large-scale computation. This has allowed the execution of the big data analytics to gather pace in recent years across organizations and enterprises. However, even with the high computing capability, it is a big challenge to efficiently extract valuable information from vast astronomical data. Hence, we require unforeseen scalability of performance to deal with the execution of big data analytics. A big question in this regard is how to maximally leverage the high computing capabilities from the aforementioned loosely coupled distributed infrastructure to ensure fast and accurate execution of big data analytics. In this regard, this chapter focuses on synchronous parallelization of big data analytics over a distributed system environment to optimize performance.

Download Full-text

Effective Statistical Methods for Big Data Analytics

Handbook of Research on Applied Cybernetics and Systems Science - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2498-4.ch014 ◽

2017 ◽

pp. 280-299 ◽

Cited By ~ 3

Author(s):

Cheng Meng ◽

Ye Wang ◽

Xinlian Zhang ◽

Abhyuday Mandal ◽

Wenxuan Zhong ◽

...

Keyword(s):

Decision Making ◽

Big Data ◽

Knowledge Discovery ◽

Statistical Methods ◽

Large Scale ◽

Big Data Analytics ◽

Divide And Conquer ◽

Data Driven ◽

The Past ◽

Large Scale Dataset

With advances in technologies in the past decade, the amount of data generated and recorded has grown enormously in virtually all fields of industry and science. This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this chapter, we review currently available methods for big data, with a focus on the subsampling methods using statistical leveraging and divide and conquer methods.

Download Full-text

Introduction to Big Data and Business Analytics

10.4018/978-1-6684-3662-2.ch004 ◽

2022 ◽

pp. 67-76

Author(s):

Dineshkumar Bhagwandas Vaghela

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Data Sets ◽

Complex Data ◽

Business Analytics ◽

Database Applications ◽

Complex Data Sets ◽

History Of ◽

Rapid Generation

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.

Download Full-text