Predicting Design Performance Utilizing Automated Topic Discovery

2020 ◽  
Vol 142 (12) ◽  
Author(s):  
Zachary Ball ◽  
Kemper Lewis

Abstract Increasingly complex engineering design challenges requires the diversification of knowledge required on design teams. In the context of open innovation, positioning key members within these teams or groups based on their estimated abilities leads to more impactful results since mass collaboration is fundamentally a sociotechnical system. Determining how each individual influences the overall design process requires an understanding of the predicted mapping between their technical competency and performance. This work explores this relationship through the use of predictive models composed of various algorithms. With support of a dataset composed of documents related to the design performance of students working on their capstone design project in combination with textual descriptors representing individual technical aptitudes, correlations are explored as a method to predict overall project development performance. Each technical competency and project is represented as a distribution of topic knowledge to produce the performance metrics, which are referred to as topic competencies, since topic representations increase the ability to decompose and identify human-centric performance measures. Three methods of topic identification and five prediction models are compared based on their prediction accuracy. From this analysis, it is found that representing input variables as topics distributions and the resulting performance as a single indicator while using support vector regression provided the most accurate mapping between ability and performance. With these findings, complex open innovation projects will benefit from increased knowledge of individual ability and how that correlates to their predicted performances.

Author(s):  
Zachary Ball ◽  
Kemper Lewis

Abstract Increasing the complexity of engineering design projects expands of the diversity of required topic knowledge. Multi-disciplinary design processes have the need for expertise from multiple fields of study. In the context of mass collaboration within engineering design, positioning key members within multi-disciplinary teams is of great importance. Determining how each discipline impacts the overall design process requires an understanding of the mapping between competency and performance. This work explores this mapping through the use of predictive models composed of various regression algorithms. Design performance of students working on their capstone design project is analyzed and the relationship between individual competencies is compared against their overall project performance. Each competency and project is represented as a distribution of topic knowledge to produce the performance metrics. Following the automated topic extraction of the textual data, the regression algorithms are applied. Three topic models and five prediction models are compared for their prediction accuracy. From this analysis it was found that representing both input and output variables as a distribution of topics while performing a support vector regression provided the most accurate mapping between ability and performance.


2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


2021 ◽  
Author(s):  
Sridevi S ◽  
Jeevaa Katiravan Jeevaa Katiravan

Abstract Scientific workflows deserve the emerging attention in sophisticated large-scale scientific problem-solving environments. Though a single task failure occurs in workflow based applications, due to its task dependency nature the reliability of the overall system will be affected drastically. Hence rather than reactive fault tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring an Exotic Intelligent Water Drops - Support Vector Regression-based approach for task failure prognostication which facilitates proactive fault tolerance in scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and its precision accuracy is optimized by IWDA and various performance metrics were evaluated. The experimental results prove that the proposed approach performs better compared with the other existing techniques.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1692 ◽  
Author(s):  
Iván Silva ◽  
José Eugenio Naranjo

Identifying driving styles using classification models with in-vehicle data can provide automated feedback to drivers on their driving behavior, particularly if they are driving safely. Although several classification models have been developed for this purpose, there is no consensus on which classifier performs better at identifying driving styles. Therefore, more research is needed to evaluate classification models by comparing performance metrics. In this paper, a data-driven machine-learning methodology for classifying driving styles is introduced. This methodology is grounded in well-established machine-learning (ML) methods and literature related to driving-styles research. The methodology is illustrated through a study involving data collected from 50 drivers from two different cities in a naturalistic setting. Five features were extracted from the raw data. Fifteen experts were involved in the data labeling to derive the ground truth of the dataset. The dataset fed five different models (Support Vector Machines (SVM), Artificial Neural Networks (ANN), fuzzy logic, k-Nearest Neighbor (kNN), and Random Forests (RF)). These models were evaluated in terms of a set of performance metrics and statistical tests. The experimental results from performance metrics showed that SVM outperformed the other four models, achieving an average accuracy of 0.96, F1-Score of 0.9595, Area Under the Curve (AUC) of 0.9730, and Kappa of 0.9375. In addition, Wilcoxon tests indicated that ANN predicts differently to the other four models. These promising results demonstrate that the proposed methodology may support researchers in making informed decisions about which ML model performs better for driving-styles classification.


2020 ◽  
Vol 39 (5) ◽  
pp. 6073-6087
Author(s):  
Meltem Yontar ◽  
Özge Hüsniye Namli ◽  
Seda Yanik

Customer behavior prediction is gaining more importance in the banking sector like in any other sector recently. This study aims to propose a model to predict whether credit card users will pay their debts or not. Using the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. For the prediction of customers’ payment status of next months, we use Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Tree (CART) and C4.5, which are widely used artificial intelligence and decision tree algorithms. Our dataset includes 10713 customer’s records obtained from a well-known bank in Taiwan. These records consist of customer information such as the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We apply cross validation and hold-out methods to divide our dataset into two parts as training and test sets. Then we evaluate the algorithms with the proposed performance metrics. We also optimize the parameters of the algorithms to improve the performance of prediction. The results show that the model built with the CART algorithm, one of the decision tree algorithm, provides high accuracy (about 86%) to predict the customers’ payment status for next month. When the algorithm parameters are optimized, classification accuracy and performance are increased.


Author(s):  
Dana Bani-Hani ◽  
Pruthak Patel ◽  
Tasneem Alshaikh

Diabetes is a serious, chronic disease that has been seeing a rise in the number of cases and prevalence over the past few decades. It can lead to serious complications and can increase the overall risk of dying prematurely. Data-oriented prediction models have become effective tools that help medical decision-making and diagnoses in which the use of machine learning in medicine has increased substantially. This research introduces the Recursive General Regression Neural Network Oracle (RGRNN Oracle) and is applied on the Pima Indians Diabetes dataset for the prediction and diagnosis of diabetes. The R-GRNN Oracle (Bani-Hani, 2017) is an enhancement to the GRNN Oracle developed by Masters et al. in 1998, in which the recursive model is created of two oracles: one within the other. Several classifiers, along with the R-GRNN Oracle and the GRNN Oracle, are applied to the dataset, they are: Support Vector Machine (SVM), Multilayer Perceptron (MLP), Probabilistic Neural Network (PNN), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor (KNN), and Random Forest (RF). Genetic Algorithm (GA) was used for feature selection as well as the hyperparameter optimization of SVM and MLP, and Grid Search (GS) was used to optimize the hyperparameters of KNN and RF. The performance metrics accuracy, AUC, sensitivity, and specificity were recorded for each classifier.


Extensive research has been carried out on the prediction of diesel engine performance. Machine learning techniques such as support vector regression technique makes the performance predictions simpler. Support vector regression is a regression algorithm used to minimize the error with a threshold value and tries to fit the best line with a threshold value. In this paper, a detailed study of diesel engine performance using support vector regression and performance metrics such as brake thermal efficiency and accuracy are explored. Findings specify that support vector regression is an efficient technique for diesel engine performance that validates and compares the actual performance with high accuracy. For engine performance, the support vector machine supports to reduce the time and cost of testing.


Author(s):  
Yousef O. Sharrab ◽  
Mohammad Alsmirat ◽  
Bilal Hawashin ◽  
Nabil Sarhan

Advancement of the prediction models used in a variety of fields is a result of the contribution of machine learning approaches. Utilizing such modeling in feature engineering is exceptionally imperative and required. In this research, we show how to utilize machine learning to save time in research experiments, where we save more than five thousand hours of measuring the energy consumption of encoding recordings. Since measuring the energy consumption has got to be done by humans and since we require more than eleven thousand experiments to cover all the combinations of video sequences, video bit_rate, and video encoding settings, we utilize machine learning to model the energy consumption utilizing linear regression. VP8 codec has been offered by Google as an open video encoder in an effort to replace the popular MPEG-4 Part 10, known as H.264/AVC video encoder standard. This research model energy consumption and describes the major differences between H.264/AVC and VP8 encoders in terms of energy consumption and performance through experiments that are based on machine learning modeling. Twenty-nine raw video sequences are used, offering a wide range of resolutions and contents, with the frame sizes ranging from QCIF(176x144) to 2160p(3840x2160). For fairness in comparison analysis, we use seven settings in VP8 encoder and fifteen types of tuning in H.264/AVC. The settings cover various video qualities. The performance metrics include video qualities, encoding time, and encoding energy consumption.


Author(s):  
Osman Erman Gungor ◽  
Imad L. Al-Qadi

Aviation promotes trade and tourism by connecting regions, people, and countries. Having a functional and efficient airport pavement network is important to improve aviation traffic and to provide safer mobility to almost 800 million passengers travelling in the U.S. per year. The Federal Aviation Administration has initiated and actively been participating in many projects to further advance pavement design and performance to meet user requirements. To accomplish that, quantitative data are needed; such data may be collected from the pavement response to gear and environment loading. In this study, responses from four instrumented taxiway concrete slabs at John F. Kennedy International Airport were analyzed. The collected data were used to develop machine-learning (ML) based prediction models to compute the temperature, curling and bending strains within pavement. The ML models were developed using the support vector machine (SVM) algorithm. The results showed that SVM based ML models can predict pavement responses with a high accuracy and low computation time. Furthermore, in the case of feeding more data from various airports, ML models have proven to be a promising technique for pavement analysis engine for future airport pavement design frameworks. This study also produces recommendations for future data collection projects to have well-designed databases for data-driven models development.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Sikandar Ali ◽  
Muhammad Adeel ◽  
Sumaira Johar ◽  
Muhammad Zeeshan ◽  
Samad Baseer ◽  
...  

An incident, in the perception of information technology, is an event that is not part of a normal process and disrupts operational procedure. This research work particularly focuses on software failure incidents. In any operational environment, software failure can put the quality and performance of services at risk. Many efforts are made to overcome this incident of software failure and to restore normal service as soon as possible. The main contribution of this study is software failure incidents classification and prediction using machine learning. In this study, an active learning approach is used to selectively label those data which is considered to be more informative to build models. Firstly, the sample with the highest randomness (entropy) is selected for labeling. Secondly, to classify the labeled observation into either failure or no failure classes, a binary classifier is used that predicts the target class label as failure or not. For classification, Support Vector Machine is used as a main classifier to classify the data. We derived our prediction models from the failure log files collected from the ECLIPSE software repository.


Sign in / Sign up

Export Citation Format

Share Document