Cricket Match Analytics Using the Big Data Approach

Mazhar Javed Awan; Syed Arbaz Haider Gilani; Hamza Ramzan; Haitham Nobanee; Awais Yasin; Azlan Mohd Zain; Rabia Javed

doi:10.3390/electronics10192350

Cricket Match Analytics Using the Big Data Approach

Electronics ◽

10.3390/electronics10192350 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2350

Author(s):

Mazhar Javed Awan ◽

Syed Arbaz Haider Gilani ◽

Hamza Ramzan ◽

Haitham Nobanee ◽

Awais Yasin ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Linear Regression ◽

Mean Square Error ◽

Selection Process ◽

Big Data Analytics ◽

Absolute Error ◽

Mean Square ◽

Data Framework ◽

The Individual

Cricket is one of the most liked, played, encouraged, and exciting sports in today’s time that requires a proper advancement with machine learning and artificial intelligence (AI) to attain more accuracy. With the increasing number of matches with time, the data related to cricket matches and the individual player are increasing rapidly. Moreover, the need of using big data analytics and the opportunities of utilizing this big data effectively in many beneficial ways are also increasing, such as the selection process of players in the team, predicting the winner of the match, and many more future predictions using some machine learning models or big data techniques. We applied the machine learning linear regression model to predict the team scores without big data and the big data framework Spark ML. The experimental results are measured through accuracy, the root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE), respectively 95%, 30.2, 1350.34, and 28.2 after applying linear regression in Spark ML. Furthermore, our approach can be applied to other sports.

Download Full-text

Application of Novel Machine Learning Techniques for Predicting the Surface Chloride Concentration in Concrete Containing Waste Material

Materials ◽

10.3390/ma14092297 ◽

2021 ◽

Vol 14 (9) ◽

pp. 2297

Author(s):

Ayaz Ahmad ◽

Furqan Farooq ◽

Krzysztof Adam Ostrowski ◽

Klaudia Śliwa-Wieczorek ◽

Slawomir Czarnecki

Keyword(s):

Machine Learning ◽

Mean Square Error ◽

Gene Expression Programming ◽

Chloride Concentration ◽

Absolute Error ◽

Chloride Ions ◽

Machine Learning Techniques ◽

Extensive Literature ◽

Mean Square ◽

Chloride Concentrations

Structures located on the coast are subjected to the long-term influence of chloride ions, which cause the corrosion of steel reinforcements in concrete elements. This corrosion severely affects the performance of the elements and may shorten the lifespan of an entire structure. Even though experimental activities in laboratories might be a solution, they may also be problematic due to time and costs. Thus, the application of individual machine learning (ML) techniques has been investigated to predict surface chloride concentrations (Cc) in marine structures. For this purpose, the values of Cc in tidal, splash, and submerged zones were collected from an extensive literature survey and incorporated into the article. Gene expression programming (GEP), the decision tree (DT), and an artificial neural network (ANN) were used to predict the surface chloride concentrations, and the most accurate algorithm was then selected. The GEP model was the most accurate when compared to ANN and DT, which was confirmed by the high accuracy level of the K-fold cross-validation and linear correlation coefficient (R2), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) parameters. As is shown in the article, the proposed method is an effective and accurate way to predict the surface chloride concentration without the inconveniences of laboratory tests.

Download Full-text

Predictors of outpatients’ no-show: big data analytics using apache spark

Journal Of Big Data ◽

10.1186/s40537-020-00384-9 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Healthcare Organizations ◽

Data Framework ◽

Learning Techniques

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.

Download Full-text

Model Evaluation for Forecasting Traffic Accident Severity in Rainy Seasons Using Machine Learning Algorithms: Seoul City Study

Applied Sciences ◽

10.3390/app10010129 ◽

2019 ◽

Vol 10 (1) ◽

pp. 129 ◽

Cited By ~ 3

Author(s):

Jonghak Lee ◽

Taekwan Yoon ◽

Sangil Kwon ◽

Jongtae Lee

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Square Error ◽

Model Evaluation ◽

Traffic Accident ◽

Negative Binomial ◽

Machine Learning Algorithms ◽

Mean Square ◽

Road Geometry

There have been numerous studies on traffic accidents and their severity, particularly in relation to weather conditions and road geometry. In these studies, traditional statistical methods have been employed, such as linear regression, logistic regression, and negative binomial regression modeling, which are the most common linear and non-linear regression analysis methods. In this research, machine learning architecture was applied to this problem using the random forest, artificial neural network, and decision tree techniques to ascertain the strengths and weaknesses of these methods. Three data sets were used: road geometry data, precipitation data, and traffic accident data over nine years corresponding to the Naebu Expressway, which is located in Seoul, Korea. For the model evaluation, three measures were employed: the out-of-bag estimate of error rate (OOB), mean square error (MSE), and root mean square error (RMSE). The low mean OOB, MSE, and RMSE observed in the results obtained using the proposed random forest model demonstrate its accuracy.

Download Full-text

Predictors of Outpatients’ No-Show: Big Data Analytics using Apache Spark

10.21203/rs.3.rs-33216/v3 ◽

2020 ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Healthcare Organizations ◽

Data Framework ◽

Learning Techniques

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.

Download Full-text

PEMODELAN PREDIKTIF KONSUMSI ENERGI BANGUNAN GEDUNG KOMERSIAL DENGAN ALGORITMA SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v14i2.882 ◽

2018 ◽

Vol 14 (2) ◽

pp. 225

Author(s):

Indriyanti Indriyanti ◽

Agus Subekti

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Root Mean Square Error ◽

Mean Square Error ◽

Root Mean Square ◽

Mean Absolute Error ◽

Absolute Error ◽

Support Vector ◽

Mean Square

Konsumsi energi bangunan yang semakin meningkat mendorong para peneliti untuk membangun sebuah model prediksi dengan menerapkan metode machine learning, namun masih belum diketahui model yang paling akurat. Model prediktif untuk konsumsi energi bangunan komersial penting untuk konservasi energi. Dengan menggunakan model yang tepat, kita dapat membuat desain bangunan yang lebih efisien dalam penggunaan energi. Dalam tulisan ini, kami mengusulkan model prediktif berdasarkan metode pembelajaran mesin untuk mendapatkan model terbaik dalam memprediksi total konsumsi energi. Algoritma yang digunakan yaitu SMOreg dan LibSVM dari kelas Support Vector Machine, kemudian untuk evaluasi model berdasarkan nilai Mean Absolute Error dan Root Mean Square Error. Dengan menggunakan dataset publik yang tersedia, kami mengembangkan model berdasarkan pada mesin vektor pendukung untuk regresi. Hasil pengujian kedua algoritma tersebut diketahui bahwa algoritma SMOreg memiliki akurasi lebih baik karena memiliki nilai MAE dan RMSE sebesar 4,70 dan 10,15, sedangkan untuk model LibSVM memiliki nilai MAE dan RMSE sebesar 9,37 dan 14,45. Kami mengusulkan metode berdasarkan algoritma SMOreg karena kinerjanya lebih baik.

Download Full-text

Predictors of Outpatients’ No-Show: Big Data Analytics using Apache Spark

10.21203/rs.3.rs-33216/v1 ◽

2020 ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Healthcare Organizations ◽

Data Framework ◽

Huge Data ◽

Learning Techniques

Download Full-text

PERBANDINGAN ALGORITMA LINEAR REGRESSION, LSTM, DAN GRU DALAM MEMPREDIKSI HARGA SAHAM DENGAN MODEL TIME SERIES

SEMINASTIKA ◽

10.47002/seminastika.v3i1.275 ◽

2021 ◽

Vol 3 (1) ◽

pp. 39-46

Author(s):

Khalis Sofi ◽

Aswan Supriyadi Sunge ◽

Sasmitoh Rahmad Riady ◽

Antika Zahrotul Kamalia

Keyword(s):

Time Series ◽

Linear Regression ◽

Mean Square Error ◽

Short Term Memory ◽

Absolute Error ◽

Mean Square ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Gated Recurrent Unit

Penelitian ini bertujuan untuk memprediksi harga saham dengan membandingkan algoritma Linear Regression, Long Short-Term Memory (LSTM), dan Gated Recurrent Unit (GRU) dengan dataset publik kemudian menentukan performa terbaik dari ketiga algoritma tersebut. Dataset yang diuji bersumber dari Indonesia Stock Exchange (IDX), yaitu dataset harga saham KEJU berbentuk time series dari tanggal 15 November 2019 sampai dengan 08 Juni 2021. Parameter yang digunakan untuk pengukuran perbandingan adalah RMSE (Root Mean Square Error), MSE (Mean Square Error), dan MAE (Mean Absolute Error). Setelah dilakukan proses training dan testing, dihasilkan sebuah analisis bahwa dari hasil perbandingan algoritma yang digunakan, algoritma Gated Recurrent Unit (GRU) memiliki performance paling baik dibandingkan Linear Regression dan Long-Short Term Memory (LSTM) dalam hal memprediksi harga saham, dibuktikan dengan nilai RMSE, MSE, dan MAE dari uji coba GRU paling rendah, yaitu nilai RMSE 0.034, MSE 0.001, dan nilai MAE 0.024.

Download Full-text

Predictors of Outpatients’ No-Show: Big Data Analytics using Apache Spark

10.21203/rs.3.rs-33216/v2 ◽

2020 ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Healthcare Organizations ◽

Data Framework ◽

Learning Techniques

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813, 19) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques

Download Full-text

Predictors of Outpatients’ No-Show: Big Data Analytics using Apache Spark

10.21203/rs.3.rs-33216/v4 ◽

2020 ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Healthcare Organizations ◽

Data Framework ◽

Learning Techniques

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.

Download Full-text

Data Driven Smart Proxy for CFD Application of Big Data Analytics & Machine Learning in Computational Fluid Dynamics, Report Two: Model Building at the Cell Level

10.2172/1431303 ◽

2018 ◽

Cited By ~ 1

Author(s):

A. Ansari ◽

S. Mohaghegh ◽

M. Shahnam ◽

J. F. Dietiker ◽

T. Li

Keyword(s):

Machine Learning ◽

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Big Data ◽

Data Analytics ◽

Model Building ◽

Big Data Analytics ◽

Data Driven ◽

Cell Level

Download Full-text