Uncovering Big Bias with Big Data: An Introduction to Linear Regression

2018 ◽  
pp. 173-188
Author(s):  
David Colarusso
Keyword(s):  
Big Data ◽  
Web Services ◽  
2019 ◽  
pp. 314-331 ◽  
Author(s):  
Sema A. Kalaian ◽  
Rafa M. Kasim ◽  
Nabeel R. Kasim

Data analytics and modeling are powerful analytical tools for knowledge discovery through examining and capturing the complex and hidden relationships and patterns among the quantitative variables in the existing massive structured Big Data in efforts to predict future enterprise performance. The main purpose of this chapter is to present a conceptual and practical overview of some of the basic and advanced analytical tools for analyzing structured Big Data. The chapter covers descriptive and predictive analytical methods. Descriptive analytical tools such as mean, median, mode, variance, standard deviation, and data visualization methods (e.g., histograms, line charts) are covered. Predictive analytical tools for analyzing Big Data such as correlation, simple- and multiple- linear regression are also covered in the chapter.


Author(s):  
Saranya N. ◽  
Saravana Selvam

After an era of managing data collection difficulties, these days the issue has turned into the problem of how to process these vast amounts of information. Scientists, as well as researchers, think that today, probably the most essential topic in computing science is Big Data. Big Data is used to clarify the huge volume of data that could exist in any structure. This makes it difficult for standard controlling approaches for mining the best possible data through such large data sets. Classification in Big Data is a procedure of summing up data sets dependent on various examples. There are distinctive classification frameworks which help us to classify data collections. A few methods that discussed in the chapter are Multi-Layer Perception Linear Regression, C4.5, CART, J48, SVM, ID3, Random Forest, and KNN. The target of this chapter is to provide a comprehensive evaluation of classification methods that are in effect commonly utilized.


2018 ◽  
Vol 114 (525) ◽  
pp. 393-405 ◽  
Author(s):  
HaiYing Wang ◽  
Min Yang ◽  
John Stufken

2017 ◽  
Vol 2017 ◽  
pp. 1-13
Author(s):  
Jelena Fiosina ◽  
Maksims Fiosins

Forecasting in big datasets is a common but complicated task, which cannot be executed using the well-known parametric linear regression. However, nonparametric and semiparametric methods, which enable forecasting by building nonlinear data models, are computationally intensive and lack sufficient scalability to cope with big datasets to extract successful results in a reasonable time. We present distributed parallel versions of some nonparametric and semiparametric regression models. We used MapReduce paradigm and describe the algorithms in terms of SPARK data structures to parallelize the calculations. The forecasting accuracy of the proposed algorithms is compared with the linear regression model, which is the only forecasting model currently having parallel distributed realization within the SPARK framework to address big data problems. The advantages of the parallelization of the algorithm are also provided. We validate our models conducting various numerical experiments: evaluating the goodness of fit, analyzing how increasing dataset size influences time consumption, and analyzing time consumption by varying the degree of parallelism (number of workers) in the distributed realization.


2021 ◽  
Vol 15 (3) ◽  
Author(s):  
Lin Wang ◽  
Jake Elmstedt ◽  
Weng Kee Wong ◽  
Hongquan Xu
Keyword(s):  
Big Data ◽  

Author(s):  
Luz María Hernández-Cruz ◽  
Diana Concepción Mex-Alvarez ◽  
Guadalupe Manuel Estrada-Segovia ◽  
Margarita Castillo-Tellez

Currently, the email is the most used network service as a means of communication for sending and receiving messages and files. The objective of this study is to perform an analysis of institutional emails by applying a strategic that ensures the existence of a bilateral communication between the employees. The research is of applied type, which will allow to predict assertive working groups with prosperous and productive labor relations. The study integrates the application of a Technological Big Data tool called Immersion and the analysis of a Simple Linear Regression (PLS) model using Microsoft Office Excel. The adapted methodology is composed of three phases: first, the "Data Collection" where a large volume of data is collected (personal data) from an institutional email account for the case study, then we have the "Analysis" where a simple linear regression model is constructed to analyze the relationship between the collected data and finally, the "Interpretation" where the obtained results are explained. Having important applications such as the integration of academic group, thematic networks, disciplinary committees or collaborative members in projects.


Sign in / Sign up

Export Citation Format

Share Document