scholarly journals Machine Learning Regression Analysis of EDX 2012-13 Data for Identifying the Auditors Use Case

2017 ◽  
Vol 6 (3) ◽  
pp. 01-14
Author(s):  
Mark Mueller ◽  
Greg Weber
Author(s):  
Julien Siebert ◽  
Lisa Joeckel ◽  
Jens Heidrich ◽  
Adam Trendowicz ◽  
Koji Nakamichi ◽  
...  

AbstractNowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary qualities of the system and its components (such as ISO/IEC 25010). Due to the different nature of ML, we have to re-interpret existing qualities for ML systems or add new ones (such as trustworthiness). We have to be very precise about which quality property is relevant for which entity of interest (such as completeness of training data or correctness of trained model), and how to objectively evaluate adherence to quality requirements. In this article, we present how to systematically construct quality models for ML systems based on an industrial use case. This quality model enables practitioners to specify and assess qualities for ML systems objectively. In addition to the overall construction process described, the main outcomes include a meta-model for specifying quality models for ML systems, reference elements regarding relevant views, entities, quality properties, and measures for ML systems based on existing research, an example instantiation of a quality model for a concrete industrial use case, and lessons learned from applying the construction process. We found that it is crucial to follow a systematic process in order to come up with measurable quality properties that can be evaluated in practice. In the future, we want to learn how the term quality differs between different types of ML systems and come up with reference quality models for evaluating qualities of ML systems.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7417
Author(s):  
Alex J. Hope ◽  
Utkarsh Vashisth ◽  
Matthew J. Parker ◽  
Andreas B. Ralston ◽  
Joshua M. Roper ◽  
...  

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.


2021 ◽  
Vol 110 ◽  
pp. 02006
Author(s):  
Ludmila Borisova ◽  
Galina Zhukova ◽  
Anna Kuznetsova ◽  
Julie Martin

The paper analyzes the socio-economic and demographic indicators of life expectancy in the countries of the world. Methods of regression analysis and machine learning are used. Statistically significant indicators that affect life expectancy around the world have been identified. When analyzing the data using machine learning methods, 13 of the 14 analyzed indicators were statistically significant. Significant indicators, in addition to those selected in the regression analysis, were 3: the under-five infant mortality rate (per 1,000 live births), the Net Barter Terms of Trade Index (2000 = 100), and Imports of goods and services (in % of GDP) (in the regression analysis, only the infant death rate was significant). In addition, it should be noted that there is a significant decrease in the under-five infant mortality rate (per 1,000 live births) for the EU, CIS and South-East Asian countries compared to the border set in the study for all countries: 4.65 vs. 34.9, a decrease in the birth rate from 2.785 to 1.85, a sharp increase in exports of goods and services: from 23.17 to 80.59, a halving in imports of goods and services, a drop in population growth from 2.105 to 0.85. The performed statistical analysis strongly supports the use of machine learning methods in identifying statistically significant relationships between various indicators that characterize the development of countries, if there are gaps in the data.


2021 ◽  
Vol 2107 (1) ◽  
pp. 012058
Author(s):  
Sukhairi Sudin ◽  
Azizi Naim Abdul Aziz ◽  
Fathinul Syahir Ahmad Saad ◽  
Nurul Syahirah Khalid ◽  
Ismail Ishaq Ibrahim

Abstract This project examined the influence of the cadence, speed, heart rate and power towards the cycling performance by using Garmin Edge 1000. Any change in cadence will affect the speed, heart rate and power of the novice cyclist and the changes pattern will be observed through mobile devices installed with Garmin Connect application. Every results will be recorded for the next task which analysis the collected data by using machine learning algorithm which is Regression analysis. Regression analysis is a statistical method for modelling the connection between one or more independent variables and a dependent (target) variable. Regression analysis is required to answer these types of prediction problems in machine learning. Regression is a supervised learning technique that aids in the discovery of variable correlations and allows for the prediction of a continuous output variable based on one or more predictor variables. A total of forty days’ worth of events were captured in the dataset. Cadence act as dependent variable, (y) while speed, heart rate and power act as independent variable, (x) in prediction of the cycling performance. Simple linear regression is defined as linear regression with only one input variable (x). When there are several input variables, the linear regression is referred to as multiple linear regression. The research uses a linear regression technique to predict cycling performance based on cadence analysis. The linear regression algorithm reveals a linear relationship between a dependent (y) variable and one or more independent (y) variables, thus the name. Because linear regression reveals a linear relationship, it determines how the value of the dependent variable changes as the value of the independent variable changes. This analysis use the Mean Squared Error (MSE) expense function for Linear Regression, which is the average of squared errors between expected and real values. Value of R squared had been recorded in this project. A low R-squared value means that the independent variable is not describing any of the difference in the dependent variable-regardless of variable importance, this is letting know that the defined independent variable, although meaningful, is not responsible for much of the variance in the dependent variable’s mean. By using multiple regression, the value of R-squared in this project is acceptable because over than 0.7 and as known this project based on human behaviour and usually the R-squared value hardly to have more than 0.3 if involve human factor but in this project the R-squared is acceptable.


Sign in / Sign up

Export Citation Format

Share Document