Systemizing Interprocedural Static Analysis of Large-scale Systems Code with Graspan

2021 ◽  
Vol 38 (1-2) ◽  
pp. 1-39
Author(s):  
Zhiqiang Zuo ◽  
Kai Wang ◽  
Aftab Hussain ◽  
Ardalan Amiri Sani ◽  
Yiyu Zhang ◽  
...  

There is more than a decade-long history of using static analysis to find bugs in systems such as Linux. Most of the existing static analyses developed for these systems are simple checkers that find bugs based on pattern matching. Despite the presence of many sophisticated interprocedural analyses, few of them have been employed to improve checkers for systems code due to their complex implementations and poor scalability. In this article, we revisit the scalability problem of interprocedural static analysis from a “Big Data” perspective. That is, we turn sophisticated code analysis into Big Data analytics and leverage novel data processing techniques to solve this traditional programming language problem. We propose Graspan , a disk-based parallel graph system that uses an edge-pair centric computation model to compute dynamic transitive closures on very large program graphs. We develop two backends for Graspan, namely, Graspan-C running on CPUs and Graspan-G on GPUs, and present their designs in the article. Graspan-C can analyze large-scale systems code on any commodity PC, while, if GPUs are available, Graspan-G can be readily used to achieve orders of magnitude speedup by harnessing a GPU’s massive parallelism. We have implemented fully context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases written in multiple languages such as Linux and Apache Hadoop demonstrates that their Graspan implementations are language-independent, scale to millions of lines of code, and are much simpler than their original implementations. Moreover, we show that these analyses can be used to uncover many real-world bugs in large-scale systems code.

2022 ◽  
pp. 59-79
Author(s):  
Dragorad A. Milovanovic ◽  
Vladan Pantovic

Multimedia-related things is a new class of connected objects that can be searched, discovered, and composited on the internet of media things (IoMT). A huge amount of data sets come from audio-visual sources or have a multimedia nature. However, multimedia data is currently not incorporated in the big data (BD) frameworks. The research projects, standardization initiatives, and industrial activities for integration are outlined in this chapter. MPEG IoMT interoperability and network-based media processing (NBMP) framework as an instance of the big media (BM) reference model are explored. Conceptual model of IoT and big data integration for analytics is proposed. Big data analytics is rapidly evolving both in terms of functionality and the underlying model. The authors pointed out that IoMT analytics is closely related to big data analytics, which facilitates the integration of multimedia objects in big media applications in large-scale systems. These two technologies are mutually dependent and should be researched and developed jointly.


Author(s):  
Janani Balakumar ◽  
Vijayarani Mohan

The rapid development of online social media is the method of collaboratively produced content material presents new possibilities and challenges to both producers and patrons of knowledge. The term big data refers to large-scale information control and evaluation technologies that exceed the functionality of conventional data processing techniques. In the current scenario, social media has gained amazing attention within the last decade. Accessing social media platforms and websites such as Facebook, Twitter, YouTube, LinkedIn, Instagram, and Google+, web technologies have become more responsible. People are becoming more fascinated about and relying on social media platform for records, news, and opinion of other customers on diverse topics. Hence, these situations produce a large volume of data. The main objective of this chapter is to provide knowledge about big data analytics in social media. A brief overview of big data and social media are discussed. Research challenges in social media are also discussed.


2021 ◽  
Author(s):  
R. Salter ◽  
Quyen Dong ◽  
Cody Coleman ◽  
Maria Seale ◽  
Alicia Ruvinsky ◽  
...  

The Engineer Research and Development Center, Information Technology Laboratory’s (ERDC-ITL’s) Big Data Analytics team specializes in the analysis of large-scale datasets with capabilities across four research areas that require vast amounts of data to inform and drive analysis: large-scale data governance, deep learning and machine learning, natural language processing, and automated data labeling. Unfortunately, data transfer between government organizations is a complex and time-consuming process requiring coordination of multiple parties across multiple offices and organizations. Past successes in large-scale data analytics have placed a significant demand on ERDC-ITL researchers, highlighting that few individuals fully understand how to successfully transfer data between government organizations; future project success therefore depends on a small group of individuals to efficiently execute a complicated process. The Big Data Analytics team set out to develop a standardized workflow for the transfer of large-scale datasets to ERDC-ITL, in part to educate peers and future collaborators on the process required to transfer datasets between government organizations. Researchers also aim to increase workflow efficiency while protecting data integrity. This report provides an overview of the created Data Lake Ecosystem Workflow by focusing on the six phases required to efficiently transfer large datasets to supercomputing resources located at ERDC-ITL.


Author(s):  
Marcus Tanque ◽  
Harry J Foxwell

Big data and cloud computing are transforming information technology. These comparable technologies are the result of dramatic developments in computational power, virtualization, network bandwidth, availability, storage capability, and cyber-physical systems. The crossroads of these two areas, involves the use of cloud computing services and infrastructure, to support large-scale data analytics research, providing relevant solutions or future possibilities for supply chain management. This chapter broadens the current posture of cloud computing and big data, as associate with the supply chain solutions. This chapter focuses on areas of significant technology and scientific advancements, which are likely to enhance supply chain systems. This evaluation emphasizes the security challenges and mega-trends affecting cloud computing and big data analytics pertaining to supply chain management.


Author(s):  
Sreenu G. ◽  
M.A. Saleem Durai

Advances in recent hardware technology have permitted to document transactions and other pieces of information of everyday life at an express pace. In addition of speed up and storage capacity, real-life perceptions tend to transform over time. However, there are so much prospective and highly functional values unseen in the vast volume of data. For this kind of applications conventional data mining is not suitable, so they should be tuned and changed or designed with new algorithms. Big data computing is inflowing to the category of most hopeful technologies that shows the way to new ways of thinking and decision making. This epoch of big data helps users to take benefit out of all available data to gain more precise systematic results or determine latent information, and then make best possible decisions. Depiction from a broad set of workloads, the author establishes a set of classifying measures based on the storage architecture, processing types, processing techniques and the tools and technologies used.


Big Data ◽  
2016 ◽  
pp. 1555-1581
Author(s):  
Gueyoung Jung ◽  
Tridib Mukherjee

In the modern information era, the amount of data has exploded. Current trends further indicate exponential growth of data in the future. This prevalent humungous amount of data—referred to as big data—has given rise to the problem of finding the “needle in the haystack” (i.e., extracting meaningful information from big data). Many researchers and practitioners are focusing on big data analytics to address the problem. One of the major issues in this regard is the computation requirement of big data analytics. In recent years, the proliferation of many loosely coupled distributed computing infrastructures (e.g., modern public, private, and hybrid clouds, high performance computing clusters, and grids) have enabled high computing capability to be offered for large-scale computation. This has allowed the execution of the big data analytics to gather pace in recent years across organizations and enterprises. However, even with the high computing capability, it is a big challenge to efficiently extract valuable information from vast astronomical data. Hence, we require unforeseen scalability of performance to deal with the execution of big data analytics. A big question in this regard is how to maximally leverage the high computing capabilities from the aforementioned loosely coupled distributed infrastructure to ensure fast and accurate execution of big data analytics. In this regard, this chapter focuses on synchronous parallelization of big data analytics over a distributed system environment to optimize performance.


Author(s):  
Cheng Meng ◽  
Ye Wang ◽  
Xinlian Zhang ◽  
Abhyuday Mandal ◽  
Wenxuan Zhong ◽  
...  

With advances in technologies in the past decade, the amount of data generated and recorded has grown enormously in virtually all fields of industry and science. This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this chapter, we review currently available methods for big data, with a focus on the subsampling methods using statistical leveraging and divide and conquer methods.


2022 ◽  
pp. 67-76
Author(s):  
Dineshkumar Bhagwandas Vaghela

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.


Sign in / Sign up

Export Citation Format

Share Document