scholarly journals Cricket Match Analytics Using the Big Data Approach

Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2350
Author(s):  
Mazhar Javed Awan ◽  
Syed Arbaz Haider Gilani ◽  
Hamza Ramzan ◽  
Haitham Nobanee ◽  
Awais Yasin ◽  
...  

Cricket is one of the most liked, played, encouraged, and exciting sports in today’s time that requires a proper advancement with machine learning and artificial intelligence (AI) to attain more accuracy. With the increasing number of matches with time, the data related to cricket matches and the individual player are increasing rapidly. Moreover, the need of using big data analytics and the opportunities of utilizing this big data effectively in many beneficial ways are also increasing, such as the selection process of players in the team, predicting the winner of the match, and many more future predictions using some machine learning models or big data techniques. We applied the machine learning linear regression model to predict the team scores without big data and the big data framework Spark ML. The experimental results are measured through accuracy, the root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE), respectively 95%, 30.2, 1350.34, and 28.2 after applying linear regression in Spark ML. Furthermore, our approach can be applied to other sports.

Materials ◽  
2021 ◽  
Vol 14 (9) ◽  
pp. 2297
Author(s):  
Ayaz Ahmad ◽  
Furqan Farooq ◽  
Krzysztof Adam Ostrowski ◽  
Klaudia Śliwa-Wieczorek ◽  
Slawomir Czarnecki

Structures located on the coast are subjected to the long-term influence of chloride ions, which cause the corrosion of steel reinforcements in concrete elements. This corrosion severely affects the performance of the elements and may shorten the lifespan of an entire structure. Even though experimental activities in laboratories might be a solution, they may also be problematic due to time and costs. Thus, the application of individual machine learning (ML) techniques has been investigated to predict surface chloride concentrations (Cc) in marine structures. For this purpose, the values of Cc in tidal, splash, and submerged zones were collected from an extensive literature survey and incorporated into the article. Gene expression programming (GEP), the decision tree (DT), and an artificial neural network (ANN) were used to predict the surface chloride concentrations, and the most accurate algorithm was then selected. The GEP model was the most accurate when compared to ANN and DT, which was confirmed by the high accuracy level of the K-fold cross-validation and linear correlation coefficient (R2), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) parameters. As is shown in the article, the proposed method is an effective and accurate way to predict the surface chloride concentration without the inconveniences of laboratory tests.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


2019 ◽  
Vol 10 (1) ◽  
pp. 129 ◽  
Author(s):  
Jonghak Lee ◽  
Taekwan Yoon ◽  
Sangil Kwon ◽  
Jongtae Lee

There have been numerous studies on traffic accidents and their severity, particularly in relation to weather conditions and road geometry. In these studies, traditional statistical methods have been employed, such as linear regression, logistic regression, and negative binomial regression modeling, which are the most common linear and non-linear regression analysis methods. In this research, machine learning architecture was applied to this problem using the random forest, artificial neural network, and decision tree techniques to ascertain the strengths and weaknesses of these methods. Three data sets were used: road geometry data, precipitation data, and traffic accident data over nine years corresponding to the Naebu Expressway, which is located in Seoul, Korea. For the model evaluation, three measures were employed: the out-of-bag estimate of error rate (OOB), mean square error (MSE), and root mean square error (RMSE). The low mean OOB, MSE, and RMSE observed in the results obtained using the proposed random forest model demonstrate its accuracy.


2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


2018 ◽  
Vol 14 (2) ◽  
pp. 225
Author(s):  
Indriyanti Indriyanti ◽  
Agus Subekti

Konsumsi energi bangunan yang semakin meningkat mendorong para peneliti untuk membangun sebuah model prediksi dengan menerapkan metode machine learning, namun masih belum diketahui model yang paling akurat. Model prediktif untuk konsumsi energi bangunan komersial penting untuk konservasi energi. Dengan menggunakan model yang tepat, kita dapat membuat desain bangunan yang lebih efisien dalam penggunaan energi. Dalam tulisan ini, kami mengusulkan model prediktif berdasarkan metode pembelajaran mesin untuk mendapatkan model terbaik dalam memprediksi total konsumsi energi. Algoritma yang digunakan yaitu SMOreg dan LibSVM dari kelas Support Vector Machine, kemudian untuk evaluasi model berdasarkan nilai Mean Absolute Error dan Root Mean Square Error. Dengan menggunakan dataset publik yang tersedia, kami mengembangkan model berdasarkan pada mesin vektor pendukung untuk regresi. Hasil pengujian kedua algoritma tersebut diketahui bahwa algoritma SMOreg memiliki akurasi lebih baik karena memiliki nilai MAE dan RMSE sebesar 4,70 dan 10,15, sedangkan untuk model LibSVM memiliki nilai MAE dan RMSE sebesar 9,37 dan 14,45. Kami mengusulkan metode berdasarkan algoritma SMOreg karena kinerjanya lebih baik.


2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in Spark platform. The aim of this paper is exploring factors that affect no-sow rate then can be used to formulate predictions using big data machine learning techniques.


SEMINASTIKA ◽  
2021 ◽  
Vol 3 (1) ◽  
pp. 39-46
Author(s):  
Khalis Sofi ◽  
Aswan Supriyadi Sunge ◽  
Sasmitoh Rahmad Riady ◽  
Antika Zahrotul Kamalia

Penelitian ini bertujuan untuk memprediksi harga saham dengan membandingkan algoritma Linear Regression, Long Short-Term Memory (LSTM), dan Gated Recurrent Unit (GRU) dengan dataset publik kemudian menentukan performa terbaik dari ketiga algoritma tersebut. Dataset yang diuji bersumber dari Indonesia Stock Exchange (IDX), yaitu dataset harga saham KEJU berbentuk time series dari tanggal 15 November 2019 sampai dengan 08 Juni 2021. Parameter yang digunakan untuk pengukuran perbandingan adalah RMSE (Root Mean Square Error), MSE (Mean Square Error), dan MAE (Mean Absolute Error). Setelah dilakukan proses training dan testing, dihasilkan sebuah analisis bahwa dari hasil perbandingan algoritma yang digunakan, algoritma Gated Recurrent Unit (GRU) memiliki performance paling baik dibandingkan Linear Regression dan Long-Short Term Memory (LSTM) dalam hal memprediksi harga saham, dibuktikan dengan nilai RMSE, MSE, dan MAE dari uji coba GRU paling rendah, yaitu nilai RMSE 0.034, MSE 0.001, dan nilai MAE 0.024.


2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813, 19) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques


2020 ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

Abstract Outpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


Sign in / Sign up

Export Citation Format

Share Document