Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods

C. Soto Valero

doi:10.1515/ijcss-2016-0007

Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods

International Journal of Computer Science in Sport ◽

10.1515/ijcss-2016-0007 ◽

2016 ◽

Vol 15 (2) ◽

pp. 91-112 ◽

Cited By ~ 11

Author(s):

C. Soto Valero

Keyword(s):

Data Mining ◽

Cross Validation ◽

Data Contamination ◽

Past Data ◽

Mining Methods ◽

Using Data ◽

New Statistics ◽

Fold Cross Validation ◽

Better Than ◽

Model Approach

Abstract Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.

Download Full-text

Uji Performa Algoritma Naïve Bayes untuk Prediksi Masa Studi Mahasiswa

Creative Information Technology Journal ◽

10.24076/citec.2019v6i1.178 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Irkham Widhi Saputro ◽

Bety Wulan Sari

Keyword(s):

Data Mining ◽

Cross Validation ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Study Program ◽

New Students ◽

Using Data ◽

The Many ◽

Fold Cross Validation

Universitas AMIKOM Yogyakarta adalah salah satu perguruan tinggi yang memiliki ribuan mahasiswa baru khususnya pada prodi Informatika. Pada tahun 2012 tercatat ada 1009 mahasiswa baru, dan pada tahun 2013 juga tercatat ada sebanyak 859 mahasiswa baru. Namun sayangnya, dari sekian banyak mahasiswa hanya sekitar 50% saja yang dapat lulus dengan tepat waktu. Data tersebut untuk membuat sistem klasifikasi menggunakan teknik data mining dengan metode Naïve Bayes. Dataset yang akan digunakan sebanyak 300 data yang bersumber dari data alumni angkatan 2012, dan 2013 dengan masing-masing data sebanyak 150. Data yang diperoleh memiliki 144 mahasiswa dengan keterangan lulus tepat waktu, dan 156 mahasiswa dengan keterangan lulus tidak tepat waktu. Proses pengujian akan dilakukan menggunakan metode 10-Fold Cross Validation, dan Confusion Matrix. Hasil pengujian menunjukkan bahwa rata-rata performa dari model Naïve Bayes mempunyai nilai akurasi sebesar 68%, nilai precision sebesar 61.3%, nilai recall sebesar 65.3%, dan nilai f1-score sebesar 61%. Nilai performa dari model dapat dipengaruhi oleh dataset yang digunakan untuk pembuatan model.Kata Kunci — data mining, Naïve Bayes, K-Fold Cross Validation, Confusion MatrixAMIKOM Yogyakarta University is one of the colleges that has thousands of new students, especially in the Informatics study program. In 2012 there were 1009 new students, and in 2013 there were 859 new students. But unfortunately, of the many students only around 50% can graduate on time. The data is to make the classification system using data mining techniques with the Naïve Bayes method. The dataset will be used as much as 300 data sourced from alumni data of 2012, and 2013 with each data as much as 150. The data obtained has 144 students with information passed on time, and 156 students with graduation information not on time. The testing process will be carried out using the 10-Fold Cross Validation, and Confusion Matrix method. The test results show that the average performance of the Naïve Bayes model has an accuracy value of 68%, precision value is 61.3%, recall value is 65.3%, and f1-score is 61%. The performance value of the model can be influenced by the dataset used for modeling.Keywords — data mining, classification, Naïve Bayes, graduation time

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Using Data Mining Methods to Detect Medical Fraud

Proceedings of the 2020 International Conference on Management of e-Commerce and e-Government ◽

10.1145/3409891.3409902 ◽

2020 ◽

Author(s):

Long-Sheng Chen ◽

Jia-Chuan Chen

Keyword(s):

Data Mining ◽

Mining Methods ◽

Using Data

Download Full-text

Data Mining zur Nacharbeitsdauerprognose*/Data Mining-based forecasting of rework duration - Predictive rework control and work process optimization in the automotive assembly

wt Werkstattstechnik online ◽

10.37544/1436-4980-2017-10-95 ◽

2017 ◽

Vol 107 (10) ◽

pp. 773-778

Author(s):

S. Krzoska ◽

M. Eickelmann ◽

J. Schmitt ◽

J. Prof. Deuse

Keyword(s):

Data Mining ◽

Process Optimization ◽

Quality Data ◽

Operating Process ◽

Automotive Assembly ◽

Related Quality ◽

Mining Methods ◽

Product And Process ◽

Using Data ◽

Manufacturing Execution

Der Fachbeitrag zeigt am Beispiel der Nacharbeitssteuerung und Arbeitsprozessoptimierung in der Automobilmontage, wie produkt- und prozessbezogene Qualitätsdaten durch den Einsatz von Data Mining-Methoden analysiert sowie effizient genutzt werden können. Dazu wurden Daten aus Manufacturing-Execution-Systemen (MES) mithilfe von Regressionsbäumen zur Entwicklung einer fahrzeugspezifischen Nacharbeitsdauerprognose ausgewertet. Das grundlegende Data Mining-Konzept sowie die Pilotierungsergebnisse werden nachfolgend dargestellt.   The article shows at the example of rework control and operating process optimization in the car assembly how recorded product- and process-related quality data can be analyzed and used efficiently by using Data Mining-methods. With data from MES-systems regression trees were built for a vehicle-specific rework duration forecast. The basic concept and validation results will be presented below.

Download Full-text

Unsupervised segmentation of microstructural images of steel using data mining methods

Computational Materials Science ◽

10.1016/j.commatsci.2021.110855 ◽

2022 ◽

Vol 201 ◽

pp. 110855

Author(s):

Hoheok Kim ◽

Yuuki Arisato ◽

Junya Inoue

Keyword(s):

Data Mining ◽

Unsupervised Segmentation ◽

Mining Methods ◽

Using Data

Download Full-text

Electric Vehicle Load Forecasting using Data Mining Methods

Hybrid and Electric Vehicles Conference 2013 (HEVC 2013) ◽

10.1049/cp.2013.1914 ◽

2013 ◽

Cited By ~ 16

Author(s):

S. Xydas ◽

A.S. Hassan ◽

C.E. Marmaras ◽

N. Jenkins ◽

L.M. Cipcigan

Keyword(s):

Data Mining ◽

Electric Vehicle ◽

Load Forecasting ◽

Vehicle Load ◽

Mining Methods ◽

Using Data

Download Full-text

A Survey of Anomaly Detection Using Data Mining Methods for Hypertext Transfer Protocol Web Services

Journal of Computer Science ◽

10.3844/jcssp.2015.89.97 ◽

2015 ◽

Vol 11 (1) ◽

pp. 89-97 ◽

Cited By ~ 3

Author(s):

Mohsen Kakavand ◽

Norwati Mustapha ◽

Aida Mustapha ◽

Mohd Taufik Abdullah ◽

Hamed Riahi

Keyword(s):

Data Mining ◽

Web Services ◽

Anomaly Detection ◽

Hypertext Transfer Protocol ◽

Mining Methods ◽

Using Data ◽

Transfer Protocol

Download Full-text

Using Data Mining Methods to Detect Simulated Intrusions on a Modbus Network

2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2) ◽

10.1109/sc2.2017.29 ◽

2017 ◽

Cited By ~ 3

Author(s):

Szu-Chuang Li ◽

Yennun Huang ◽

Bo-Chen Tai ◽

Chi-Ta Lin

Keyword(s):

Data Mining ◽

Mining Methods ◽

Using Data

Download Full-text

A Model for Predicting Outfit Sales: Using Data Mining Methods

Advances in Intelligent Systems and Computing - Emerging Technologies in Data Mining and Information Security ◽

10.1007/978-981-13-1498-8_62 ◽

2018 ◽

pp. 711-720

Author(s):

Mohammad Aman Ullah

Keyword(s):

Data Mining ◽

Mining Methods ◽

Using Data

Download Full-text

Observation of Success Status of Employees in E-Learning Courses in Organizations with Data Mining

International Journal of E-Adoption ◽

10.4018/ijea.2017010104 ◽

2017 ◽

Vol 9 (1) ◽

pp. 38-49

Author(s):

Fatma Önay Koçoğlu ◽

İlkim Ecem Emre ◽

Çiğdem Selçukcan Erol

Keyword(s):

Data Mining ◽

Evaluation Criteria ◽

Data Set ◽

Completion Status ◽

Completion Date ◽

E Learning ◽

Pharmaceutical Industries ◽

Mining Methods ◽

Using Data ◽

Performance Results

The aim of this study is to analyze success in e-learning with data mining methods and find out potential patterns. In this context, 374.073 data of 2013-14 period taken from an institution serving in e-learning field in Turkey are used. Data set, which is collected from information technology, banking and pharmaceutical industries, includes success and industry of employees', trainings which they complete, whether the trainings are completed, first login and last logout dates, training completion date and duration of experience in training. Using this data set, success status of participants is observed by using data mining methods (C5.0, Random Forest and Gini). By observing using accuracy, error rate, specificity and f- score from performance evaluation criteria, C5.0 has chosen the algorithm which gives the best performance results. According to the results of the study, it has been determined that the sectors of the employees are not important, on the contrary the ones that are important are the completion status, the duration of experience and training.

Download Full-text