Uncovering Big Bias with Big Data: An Introduction to Linear Regression

Descriptive and Predictive Analytical Methods for Big Data

Web Services ◽

10.4018/978-1-5225-7501-6.ch018 ◽

2019 ◽

pp. 314-331 ◽

Cited By ~ 1

Author(s):

Sema A. Kalaian ◽

Rafa M. Kasim ◽

Nabeel R. Kasim

Keyword(s):

Big Data ◽

Standard Deviation ◽

Linear Regression ◽

Multiple Linear Regression ◽

Knowledge Discovery ◽

Data Visualization ◽

Analytical Methods ◽

Data Analytics ◽

Enterprise Performance ◽

Analytical Tools

Data analytics and modeling are powerful analytical tools for knowledge discovery through examining and capturing the complex and hidden relationships and patterns among the quantitative variables in the existing massive structured Big Data in efforts to predict future enterprise performance. The main purpose of this chapter is to present a conceptual and practical overview of some of the basic and advanced analytical tools for analyzing structured Big Data. The chapter covers descriptive and predictive analytical methods. Descriptive analytical tools such as mean, median, mode, variance, standard deviation, and data visualization methods (e.g., histograms, line charts) are covered. Predictive analytical tools for analyzing Big Data such as correlation, simple- and multiple- linear regression are also covered in the chapter.

Download Full-text

A Detailed Study on Classification Algorithms in Big Data

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch002 ◽

2020 ◽

pp. 30-46

Author(s):

Saranya N. ◽

Saravana Selvam

Keyword(s):

Big Data ◽

Random Forest ◽

Linear Regression ◽

Comprehensive Evaluation ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Classification Methods ◽

Computing Science ◽

Data Collections

After an era of managing data collection difficulties, these days the issue has turned into the problem of how to process these vast amounts of information. Scientists, as well as researchers, think that today, probably the most essential topic in computing science is Big Data. Big Data is used to clarify the huge volume of data that could exist in any structure. This makes it difficult for standard controlling approaches for mining the best possible data through such large data sets. Classification in Big Data is a procedure of summing up data sets dependent on various examples. There are distinctive classification frameworks which help us to classify data collections. A few methods that discussed in the chapter are Multi-Layer Perception Linear Regression, C4.5, CART, J48, SVM, ID3, Random Forest, and KNN. The target of this chapter is to provide a comprehensive evaluation of classification methods that are in effect commonly utilized.

Download Full-text

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Journal of the American Statistical Association ◽

10.1080/01621459.2017.1408468 ◽

2018 ◽

Vol 114 (525) ◽

pp. 393-405 ◽

Cited By ~ 22

Author(s):

HaiYing Wang ◽

Min Yang ◽

John Stufken

Keyword(s):

Big Data ◽

Linear Regression ◽

Selection For

Download Full-text

Educational Resource Information Sharing Algorithm Based On Big Data Association Mining and Quasi-Linear Regression Analysis

International Journal of Continuing Engineering Education and Life-Long Learning ◽

10.1504/ijceell.2019.10023380 ◽

2019 ◽

Vol 29 (1) ◽

pp. 336

Author(s):

Yanjun GAO

Keyword(s):

Big Data ◽

Regression Analysis ◽

Linear Regression ◽

Information Sharing ◽

Linear Regression Analysis ◽

Data Association ◽

Association Mining ◽

Educational Resource ◽

Resource Information

Download Full-text

Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting

Applied Computational Intelligence and Soft Computing ◽

10.1155/2017/5134962 ◽

2017 ◽

Vol 2017 ◽

pp. 1-13

Author(s):

Jelena Fiosina ◽

Maksims Fiosins

Keyword(s):

Big Data ◽

Linear Regression ◽

Goodness Of Fit ◽

Semiparametric Regression ◽

Forecasting Accuracy ◽

Time Consumption ◽

Dataset Size ◽

Computationally Intensive ◽

Mapreduce Paradigm ◽

Complicated Task

Forecasting in big datasets is a common but complicated task, which cannot be executed using the well-known parametric linear regression. However, nonparametric and semiparametric methods, which enable forecasting by building nonlinear data models, are computationally intensive and lack sufficient scalability to cope with big datasets to extract successful results in a reasonable time. We present distributed parallel versions of some nonparametric and semiparametric regression models. We used MapReduce paradigm and describe the algorithms in terms of SPARK data structures to parallelize the calculations. The forecasting accuracy of the proposed algorithms is compared with the linear regression model, which is the only forecasting model currently having parallel distributed realization within the SPARK framework to address big data problems. The advantages of the parallelization of the algorithm are also provided. We validate our models conducting various numerical experiments: evaluating the goodness of fit, analyzing how increasing dataset size influences time consumption, and analyzing time consumption by varying the degree of parallelism (number of workers) in the distributed realization.

Download Full-text

Orthogonal subsampling for big data linear regression

The Annals of Applied Statistics ◽

10.1214/21-aoas1462 ◽

2021 ◽

Vol 15 (3) ◽

Author(s):

Lin Wang ◽

Jake Elmstedt ◽

Weng Kee Wong ◽

Hongquan Xu

Keyword(s):

Big Data ◽

Linear Regression

Download Full-text

Uso de correo electrónico para analizar la comunicación bilateral aplicando Big Data y Regresión Lineal Simple

Revista de Tecnologías de la Información y Comunicaciones ◽

10.35429/jitc.2019.10.3.21.28 ◽

2019 ◽

pp. 21-28

Author(s):

Luz María Hernández-Cruz ◽

Diana Concepción Mex-Alvarez ◽

Guadalupe Manuel Estrada-Segovia ◽

Margarita Castillo-Tellez

Keyword(s):

Big Data ◽

Linear Regression ◽

Personal Data ◽

Simple Linear Regression ◽

Productive Labor ◽

Working Groups ◽

Microsoft Office ◽

The Relationship ◽

Academic Group

Currently, the email is the most used network service as a means of communication for sending and receiving messages and files. The objective of this study is to perform an analysis of institutional emails by applying a strategic that ensures the existence of a bilateral communication between the employees. The research is of applied type, which will allow to predict assertive working groups with prosperous and productive labor relations. The study integrates the application of a Technological Big Data tool called Immersion and the analysis of a Simple Linear Regression (PLS) model using Microsoft Office Excel. The adapted methodology is composed of three phases: first, the "Data Collection" where a large volume of data is collected (personal data) from an institutional email account for the case study, then we have the "Analysis" where a simple linear regression model is constructed to analyze the relationship between the collected data and finally, the "Interpretation" where the obtained results are explained. Having important applications such as the integration of academic group, thematic networks, disciplinary committees or collaborative members in projects.

Download Full-text

Educational resource information sharing algorithm based on big data association mining and quasi-linear regression analysis

International Journal of Continuing Engineering Education and Life-Long Learning ◽

10.1504/ijceell.2019.102771 ◽

2019 ◽

Vol 29 (4) ◽

pp. 336

Author(s):

Yanjun Gao

Keyword(s):

Big Data ◽

Regression Analysis ◽

Linear Regression ◽

Information Sharing ◽

Linear Regression Analysis ◽

Data Association ◽

Association Mining ◽

Educational Resource ◽

Resource Information

Download Full-text

Testing for Signal-to-Noise Ratio in Linear Regression: A Test for Big Data Era

SSRN Electronic Journal ◽

10.2139/ssrn.3884683 ◽

2021 ◽

Author(s):

Jae H. Kim

Keyword(s):

Big Data ◽

Linear Regression ◽

Signal To Noise Ratio ◽

Signal To Noise ◽

Noise Ratio

Download Full-text

Short term load forecasting using multiple linear regression for big data

2017 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci.2017.8285261 ◽

2017 ◽

Cited By ~ 7

Author(s):

Ahmed Yousuf Saber ◽

A K M Rezaul Alam

Keyword(s):

Big Data ◽

Linear Regression ◽

Multiple Linear Regression ◽

Load Forecasting ◽

Short Term ◽

Short Term Load Forecasting

Download Full-text