Prediction of Wind Farm Power Ramp Rates: A Data-Mining Approach

2009 ◽  
Vol 131 (3) ◽  
Author(s):  
Haiyang Zheng ◽  
Andrew Kusiak

In this paper, multivariate time series models were built to predict the power ramp rates of a wind farm. The power changes were predicted at 10 min intervals. Multivariate time series models were built with data-mining algorithms. Five different data-mining algorithms were tested using data collected at a wind farm. The support vector machine regression algorithm performed best out of the five algorithms studied in this research. It provided predictions of the power ramp rate for a time horizon of 10–60 min. The boosting tree algorithm selects parameters for enhancement of the prediction accuracy of the power ramp rate. The data used in this research originated at a wind farm of 100 turbines. The test results of multivariate time series models were presented in this paper. Suggestions for future research were provided.

Author(s):  
Efat Jabarpour ◽  
Amin Abedini ◽  
Abbasali Keshtkar

Introduction: Osteoporosis is a disease that reduces bone density and loses the quality of bone microstructure leading to an increased risk of fractures. It is one of the major causes of inability and death in elderly people. The current study aims at determining the factors influencing the incidence of osteoporosis and providing a predictive model for the disease diagnosis to increase the diagnostic speed and reduce diagnostic costs. Methods: An Individual's data including personal information, lifestyle, and disease information were reviewed. A new model has been presented based on the Cross-Industry Standard Process CRISP methodology. Besides, Support Vector Machine (SVM) and Bayes methods (Tree Augmented Naïve Bayes (TAN)) and Clementine12 have been used as data mining tools. Results: Some features have been detected to affect this disease. The rules have been extracted that can be used as a pattern for the prediction of the patients' status. Classification precision was calculated to be 88.39% for SVM, and 91.29% for  (TAN) when the precision of  TAN  is higher comparing to other methods. Conclusion: The most effective factors concerning osteoporosis are detected and can be used for a new sample with defined characteristics to predict the possibility of osteoporosis in a person.  


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Thiago Cesar de Oliveira ◽  
Lúcio de Medeiros ◽  
Daniel Henrique Marco Detzel

Purpose Real estate appraisals are becoming an increasingly important means of backing up financial operations based on the values of these kinds of assets. However, in very large databases, there is a reduction in the predictive capacity when traditional methods, such as multiple linear regression (MLR), are used. This paper aims to determine whether in these cases the application of data mining algorithms can achieve superior statistical results. First, real estate appraisal databases from five towns and cities in the State of Paraná, Brazil, were obtained from Caixa Econômica Federal bank. Design/methodology/approach After initial validations, additional databases were generated with both real, transformed and nominal values, in clean and raw data. Each was assisted by the application of a wide range of data mining algorithms (multilayer perceptron, support vector regression, K-star, M5Rules and random forest), either isolated or combined (regression by discretization – logistic, bagging and stacking), with the use of 10-fold cross-validation in Weka software. Findings The results showed more varied incremental statistical results with the use of algorithms than those obtained by MLR, especially when combined algorithms were used. The largest increments were obtained in databases with a large amount of data and in those where minor initial data cleaning was carried out. The paper also conducts a further analysis, including an algorithmic ranking based on the number of significant results obtained. Originality/value The authors did not find similar studies or research studies conducted in Brazil.


Author(s):  
Anne Denton

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.


2015 ◽  
Vol 813-814 ◽  
pp. 1104-1113 ◽  
Author(s):  
A. Sumesh ◽  
Dinu Thomas Thekkuden ◽  
Binoy B. Nair ◽  
K. Rameshkumar ◽  
K. Mohandas

The quality of weld depends upon welding parameters and exposed environment conditions. Improper selection of welding process parameter is one of the important reasons for the occurrence of weld defect. In this work, arc sound signals are captured during the welding of carbon steel plates. Statistical features of the sound signals are extracted during the welding process. Data mining algorithms such as Naive Bayes, Support Vector Machines and Neural Network were used to classify the weld conditions according to the features of the sound signal. Two weld conditions namely good weld and weld with defects namely lack of fusion, and burn through were considered in this study. Classification efficiencies of machine learning algorithms were compared. Neural network is found to be producing better classification efficiency comparing with other algorithms considered in this study.


Author(s):  
Moloud Abdar ◽  
Sharareh R. Niakan Kalhori ◽  
Tole Sutikno ◽  
Imam Much Ibnu Subroto ◽  
Goli Arji

Heart diseases are among the nation’s leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases. After feature analysis, models by five algorithms including decision tree (C5.0), neural network, support vector machine (SVM), logistic regression and k-nearest neighborhood (KNN) were developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner.


Author(s):  
M. Jupri ◽  
Riyanarto Sarno

The achievement of accepting optimal tax need effective and efficient tax supervision can be achieved by classifying taxpayer compliance to tax regulations. Considering this issue, this paper proposes the classification of taxpayer compliance using data mining algorithms; i.e. C4.5, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, and Multilayer Perceptron based on the compliance of taxpayer data. The taxpayer compliance can be classified into four classes, which are (1) formal and material compliant taxpayers, (2) formal compliant taxpayers, (3) material compliant taxpayers, and (4) formal and material non-compliant taxpayers. Furthermore, the results of data mining algorithms are compared by using Fuzzy AHP and TOPSIS to determine the best performance classification based on the criteria of Accuracy, F-Score, and Time required. Selection of the taxpayer's priority for more detailed supervision at each level of taxpayer compliance is ranked using Fuzzy AHP and TOPSIS based on criteria of dataset variables. The results show that C4.5 is the best performance classification and achieves preference value of 0.998; whereas the MLP algorithm results from the lowest preference value of 0.131. Alternative taxpayer A233 is the top priority taxpayer with a preference value of 0.433; whereas alternative taxpayer A051 is the lowest priority taxpayer with a preference value of 0.036.


2019 ◽  
Vol 16 (9) ◽  
pp. 3849-3853
Author(s):  
Dar Masroof Amin ◽  
Atul Garg

The globalisation of Internet is creating enormous amount of data on servers. The data created during last two years is itself equivalent to the data created during all these years. This exponential creation of data is due to the easy access to devices based on Internet of things. This information has become a source of predictive analysis for future happenings. The versatile use of computing devices is creating data of diverse nature and the analysts are predicting the future trend using data of their respective domain. The technology used to analyse the data has become a bottleneck over the time. The main reason behind this is that the rate with which the data is getting created is much more than the technology used to access the same. There are various mining techniques used to explore the useful information. In this research there is detailed analysis of how data is used and perceived by various data mining algorithms. Mining algorithms like Naïve Bayes, Support Vector Machines, Linear Discriminant Analysis Algorithm, Artificial Neural Networks, C4.5, C5.0, K-Nearest Neighbour are analysed. The input data used in these algorithms is big data files. This research mainly focuses on how the existing data algorithms are interacting with big data files. The research has been done on twitter comments.


Water ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 2292 ◽  
Author(s):  
Vali Vakhshoori ◽  
Hamid Reza Pourghasemi ◽  
Mohammad Zare ◽  
Thomas Blaschke

The aim of this study was to apply data mining algorithms to produce a landslide susceptibility map of the national-scale catchment called Bandar Torkaman in northern Iran. As it was impossible to directly use the advanced data mining methods due to the volume of data at this scale, an intermediate approach, called normalized frequency-ratio unique condition units (NFUC), was devised to reduce the data volume. With the aid of this technique, different data mining algorithms such as fuzzy gamma (FG), binary logistic regression (BLR), backpropagation artificial neural network (BPANN), support vector machine (SVM), and C5 decision tree (C5DT) were employed. The success and prediction rates of the models, which were calculated by receiver operating characteristic curve, were 0.859 and 0.842 for FG, 0.887 and 0.855 for BLR, 0.893 and 0.856 for C5DT, 0.891 and 0.875 for SVM, and 0.896 and 0.872 for BPANN that showed the highest validation rates as compared with the other methods. The proposed approach of NFUC proved highly efficient in data volume reduction, and therefore the application of computationally demanding algorithms for large areas with voluminous data was feasible.


Author(s):  
Geert Wets ◽  
Koen Vanhoof ◽  
Theo Arentze ◽  
Harry Timmermans

The utility-maximizing framework—in particular, the logit model—is the dominantly used framework in transportation demand modeling. Computational process modeling has been introduced as an alternative approach to deal with the complexity of activity-based models of travel demand. Current rule-based systems, however, lack a methodology to derive rules from data. The relevance and performance of data-mining algorithms that potentially can provide the required methodology are explored. In particular, the C4 algorithm is applied to derive a decision tree for transport mode choice in the context of activity scheduling from a large activity diary data set. The algorithm is compared with both an alternative method of inducing decision trees (CHAID) and a logit model on the basis of goodness-of-fit on the same data set. The ratio of correctly predicted cases of a holdout sample is almost identical for the three methods. This suggests that for data sets of comparable complexity, the accuracy of predictions does not provide grounds for either rejecting or choosing the C4 method. However, the method may have advantages related to robustness. Future research is required to determine the ability of decision tree-based models in predicting behavioral change.


2021 ◽  
Author(s):  
Mustafa Yağcı ◽  
Yusuf Ziya Olpak

Abstract This study proposes a new model to analyze the grade point averages (GPAs) in the previous semester using data mining algorithms and to predict the final GPAs that students may receive in the following semesters in three gradually expanding categories (department, faculty, and university). The performances of the Random Forest, Linear Regression, Support Vector Machines, and k-Nearest Neighbors algorithms, which are among the data mining algorithms, were calculated and compared to estimate the GPAs of the students at the end of the semester. This study focused on three parameters. The first was to predict academic performance with a single independent variable. The second was to compare the performance indicators of four algorithms. The third was to compare the predictions made in three different categories. All algorithms applied correctly classified the samples at rates varying between 92% and 94%. The proposed model correctly estimated students’ grade point averages at the end of the semester with an average deviation of 0.28 points over a 4 with a single variable. Students with a high risk of failure can be determined in advance by estimating their final grade point averages.


Sign in / Sign up

Export Citation Format

Share Document