scholarly journals Determination of compound channel apparent shear stress: application of novel data mining models

2019 ◽  
Vol 21 (5) ◽  
pp. 798-811 ◽  
Author(s):  
Zohreh Sheikh Khozani ◽  
Khabat Khosravi ◽  
Binh Thai Pham ◽  
Bjørn Kløve ◽  
Wan Hanna Melini Wan Mohtar ◽  
...  

Abstract Momentum exchange in the mixing region between the floodplain and the main channel is an essential hydraulic process, particularly for the estimation of discharge. The current study investigated various data mining models to estimate apparent shear stress in a symmetric compound channel with smooth and rough floodplains. The applied predictive models include random forest (RF), random tree (RT), reduced error pruning tree (REPT), M5P, and the distinguished hybrid bagging-M5P model. The models are constructed based on several correlated physical channel characteristic variables to predict the apparent shear stress. A sensitivity analysis is applied to select the best function tuning parameters for each model. Results showed that input with six variables exhibited the best prediction results for RF model while input with four variables produced the best performance for other models. Based on the optimised input variables for each model, the efficiency of five predictive models discussed here was evaluated. It was found that the M5P and hybrid bagging-M5P models with the coefficient of determination (R2) equal to 0.905 and 0.92, respectively, in the testing stage are superior in estimating apparent shear stress in compound channels than other RF, RT and REPT models.

2016 ◽  
Vol 3 (1) ◽  
pp. 22-44 ◽  
Author(s):  
Alan Olinsky ◽  
Phyllis Schumacher ◽  
John Quinn

One way to enhance the likelihood that more university students will graduate within the specific major that they begin with is to attract the type of students who have typically (historically) done well in that field of study. This paper expands upon a study that utilizes data mining techniques to analyze the characteristics of students who enroll as actuarial students and then either drop out of the major or graduate as actuarial students. Several predictive models including logistic regression, neural networks and decision trees are obtained using input variables describing academic attributes of the students. The models are then compared and the best fitting model is determined. The regression model turns out to be the best predictor. Since this is a very well understood method, it can easily be explained. The decision tree, although its underpinnings are somewhat difficult to explain, gives a clear and well understood output. In addition, the non-predictive method of cluster analysis is applied in order to group these students into distinct classifications based on the values of the input variables. Finally, a new approach to modeling in SAS®, called Rapid Predictive Modeler (RPM), is described and utilized. The results of the RPM also select the regression model as the best predictor.


2019 ◽  
Vol 1 (2) ◽  
pp. 225-230
Author(s):  
Aswan Supriyadi Sunge

Diabetes is one of the chronic diseases caused by excess sugar in the blood. Various methods of automated algorithms in various to anticipate and diagnose diabetes. One approach to data mining method can help diagnose the patient's disease. In the presence of predictions can save human life and begin prevention before the disease attacks the patient. Choosing a legitimate classification clearly expands the truth and accuracy of the system as levels continue to increase. Most diabetics know little about the risk factors they face before the diagnosis. This method uses developing five predictive models using 9 input variables and one output variable from the dataset information. The purpose of this study was to compare performance analysis of Naive Bayes, Decision Tree, SVM, K-NN and ANN models to predict diabetes millitus


2018 ◽  
Vol 8 (7) ◽  
pp. 1191 ◽  
Author(s):  
MyungSuk Lee ◽  
MuMoungCho Han ◽  
JuGeon Pak

In 2016, the number of mobile phone subscriptions worldwide had surpassed the total world population; moreover, the number of smartphone addicts is increasing each year. Thus, the objective of this study is to analyze smartphone addiction by considering the differences between smartphone usage patterns as well as cognition. Our proposed method involves automatically collecting and analyzing data through an app instead of using the existing self-reporting method, thereby improving the accuracy of data and ensuring data reliability from respondents. Based on the results of our study, we observed that there is a significant cognitive bias between the self-reports and automatically collected data. As a result of applying data mining, among the six criteria out of the total 24 items of the questionnaire, the higher the “recurrence” item, the higher the addiction; further, “forbidden” item 1 had the largest effect on addiction. In addition, the input variables that have the greatest influence on the high-risk users were the number of times the screen was turned on and real-use time/cognitive-use time. However, the amount of data and time of smartphone usage were not related to addiction. In the future, we will modify the app to obtain more accurate data, based on which, we can analyze the effects of smartphone addiction, such as depression, anxiety, stress, self-esteem, and emotional regulation, among others.


2020 ◽  
Vol 29 (54) ◽  
pp. e10514
Author(s):  
Beatriz García-Castellanos ◽  
Osney Pérez-Ones ◽  
Lourdes Zumalacárregui-de-Cárdenas ◽  
Idania Blanco-Carvajal ◽  
Luis Eduardo López-de-la-Maza

The rum aging process shows volume losses, called wastage. The numerical operation variables: product, boardwalk, horizontal and vertical positions, date, volume, alcoholic degree, temperature, humidity and aging time, recorded in databases, contain valuable information to study the process. MATLAB 2017 software was used to estimate volume losses. In the modeling of the rum aging process, the multilayer perceptron neuronal network with one and two hidden layers was used, varying the number of neurons in these between 4 and 10. The Levenberg-Marquadt (LM) and Bayesian training algorithms were compared (Bay) The increase in 6 consecutive iterations of the validation error and 1,000 as the maximum number of training cycles were the criteria used to stop the training. The input variables to the network were: numerical month, volume, temperature, humidity, initial alcoholic degree and aging time, while the output variable was wastage. 546 pairs of input/output data were processed. The statistical Friedman and Wilcoxon tests were performed to select the best neural architecture according to the mean square error (MSE) criteria. The selected topology has a 6-4-4-1 structure, with an MSE of 2.1∙10-3 and a correlation factor (R) with experimental data of 0.9898. The neural network obtained was used to simulate thirteen initial aging conditions that were not used for training and validation, detecting a coefficient of determination (R2) of 0.9961.


2022 ◽  
Author(s):  
Bandita Naik ◽  
Vijay Kaushik ◽  
Munendra Kumar

Abstract The computation of the boundary shear stress distribution in an open channel flow is required for a variety of applications, including the flow resistance relationship and the construction of stable channels. The river breaches the main channel and spills across the floodplain during overbank flow conditions on both sides. Due to the momentum shift between the primary channel and adjacent floodplains, the flow structure in such compound channels becomes complicated. This has a profound impact on the shear stress distribution in the floodplain and main channel subsections. In addition, agriculture and development activities have occurred in floodplain parts of a river system. As a consequence, the geometry of the floodplain changes over the length of the flow, resulting in a converging compound channel. Traditional formulas, which rely heavily on empirical approaches, are ineffective in predicting shear force distribution with high precision. As a result, innovative and precise approaches are still in great demand. The boundary shear force carried by floodplains is estimated by gene expression programming (GEP) in this paper. In terms of non-dimensional geometric and flow variables, a novel equation is constructed to forecast boundary shear force distribution. The proposed GEP-based method is found to be best when compared to conventional methods. The findings indicate that the predicted percentage shear force carried by floodplains determined using GEP is in good agreement with the experimental data compared to the conventional formulas (R2 = 0.96 and RMSE = 3.395 for the training data and R2 = 0.95 and RMSE = 4.022 for the testing data).


2020 ◽  
Author(s):  
Vahid Farrahi ◽  
Maisa Niemelä ◽  
Mikko Kärmeniemi ◽  
Soile Puhakka ◽  
Maarit Kangas ◽  
...  

Abstract Purpose: A data mining approach was applied to establish a multilevel hierarchy predicting physical activity (PA) behavior, and to methodologically identify the correlates of PA behavior. Methods: Cross-sectional data from the population-based Northern Finland Birth Cohort 1966 study, collected in the most recent follow-up at age 46, were used to create a hierarchy using the chi-square automatic interaction detection (CHAID) decision tree technique for predicting PA behavior. PA behavior is defined as active or inactive depending on participants’ activity profiles, which were previously created through a multidimensional (clustering) approach on continuous accelerometer-measured activity intensities in one week. The input variables (predictors) used for decision tree fitting consisted of individual, demographical, psychological, behavioral, environmental, and physical factors. Using generalized linear mixed models, we also analyzed how factors emerging from the model were associated with three PA metrics, including daily time (minutes per day) in sedentary (SED), light PA (LPA), and moderate-to-vigorous PA (MVPA), to assure the relative importance of methodologically identified factors. Results: Of the 4,582 participants with valid accelerometer data at the latest follow-up, 2,701 and 1,881 had active and inactive profiles, respectively. We used a total of 168 factors as input variables to classify these two PA behaviors. Out of these 168 factors, the decision tree selected 36 factors of different domains from which 54 subgroups of participants were formed. The emerging factors from the model explained minutes per day in SED, LPA, and/or MVPA, including body fat percentage (SED: B=26.5, LPA: B=-16.1, and MVPA: B=-11.7), normalized heart rate recovery 60 seconds after exercise (SED: B=-16.1, LPA: B=9.9, and MVPA: B=9.6), average weekday total sitting time (SED: B=34.1, LPA: B=-25.3, and MVPA: B=-5.8), and extravagance score (SED: B=6.3 and LPA: B=-3.7). Conclusions: Using data mining, we established a data-driven model composed of 36 different factors of relative importance from empirical data. This model may be used to identify subgroups for multilevel intervention allocation and design. Additionally, this study methodologically discovered an extensive set of factors that can be a basis for additional hypothesis testing in PA correlates research.


2010 ◽  
Vol 16 (sup1) ◽  
pp. 1-14 ◽  
Author(s):  
K. K. Khatua ◽  
K. C. Patra ◽  
R. Jha

2021 ◽  
Author(s):  
Ernesto Garcia Rugerio ◽  
Rabindranarth Romero López ◽  
Gerarld Corzo Pérez

<p>The methodologies applied in the analysis of scour in cohesive soils that exist have been evaluated based on linear or potential regressions of the results of experiments carried out in laboratories, however these procedures do not allow to clearly identify the weight of each variable in the explanation of the response variable, they also do not have the ability to carry out regionalizations of the analyzed data universe so that a better coupling of the resulting equations can be done.</p><p> </p><p>Every day data mining techniques are more usefull for analysis of different problems, in the present case study, the use of these techniques is evaluated in the analysis of results of an erosion experiment in cohesive soils carried out by the Federal Highway Administration (FHWA), these results were published in technical report No. FHWA-HRT-15-033 dated May 2015.</p><p> </p><p>The geotechnical and hydraulic variables and the erosion results obtained during the execution of the experimentation were used, with which it was analyzed using the WEKA software (Waikato Environment for Knowledge Analysis) of the University of Waikato in New Zealand, which uses data mining techniques based on different rules and types of information classification such as decision trees.</p><p> </p><p>Through the application of the tree section, various tests were carried out, this with the intention of determining the most important factors that describe the phenomenon of erosion, on the other hand, a series of classifications and equations were obtained through the M5P model that describe the phenomenon . As a result, it was obtained that the variables that describe the erosion phenomenon better according to the analysis of the M5P model are the shear stress, the plasticity index, the unconfined compression stress of the samples and the content of humidity. The result is a tree with 6 rules that zoning and regressing each zone obtaining a correlation coefficient of 0.9246 with an absolute relative error of 33.5874% and a root of the relative square error of 38.0878%. It is mentioned that with the adjustment through potential regressions obtained by the FHWA, a coefficient of determination (R2) of 0.73 was obtained.</p><p> </p><p>The application of this type of techniques allows a deeper knowledge of the erosion phenomenon by classifying and regionalizing the explanatory variables, as well as carrying out regressions within these classifications, explaining the behavior of soils with content of cohesive material as a function of its variables. The implementation of these data mining techniques has more advantages than simple linear or potential regressions, being of great help in research and experimentation in the field of geotechnics and river hydraulics.</p>


Web Services ◽  
2019 ◽  
pp. 618-638
Author(s):  
Goran Klepac ◽  
Kristi L. Berg

This chapter proposes a new analytical approach that consolidates the traditional analytical approach for solving problems such as churn detection, fraud detection, building predictive models, segmentation modeling with data sources, and analytical techniques from the big data area. Presented are solutions offering a structured approach for the integration of different concepts into one, which helps analysts as well as managers to use potentials from different areas in a systematic way. By using this concept, companies have the opportunity to introduce big data potential in everyday data mining projects. As is visible from the chapter, neglecting big data potentials results often with incomplete analytical results, which imply incomplete information for business decisions and can imply bad business decisions. The chapter also provides suggestions on how to recognize useful data sources from the big data area and how to analyze them along with traditional data sources for achieving more qualitative information for business decisions.


Sign in / Sign up

Export Citation Format

Share Document