Practical foundations of machine learning for addiction research. Part I. Methods and techniques

Mapping Intimacies ◽

10.31234/osf.io/ast53 ◽

2021 ◽

Author(s):

Pablo Cresta Morgado ◽

Martín Carusso ◽

Laura Alonso Alemany ◽

Laura Acion

Keyword(s):

Machine Learning ◽

Linear Models ◽

Principal Component ◽

Research Field ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Tools ◽

Wide Range ◽

Methods And Techniques ◽

Research Problems

Machine learning assembles a broad set of methods and techniques to solve a wide range of problems, such as identifying individuals with substance use disorders (SUD), finding patterns in neuroimages, understanding SUD prognostic factors and their association, or determining addiction genetic underpinnings. However, machine learning use in the addiction research field continues to be insufficient. This two-part review focuses on machine learning tools and concepts and provides insights into their capabilities to facilitate their understanding and acquisition by addiction researchers. In this first part, we present supervised and unsupervised methods and techniques such as linear models, naive Bayes, support vector machines, artificial neural networks, k-means, or principal component analysis and examples of how these tools are already in use in addiction research. We also provide open-source programming tools to apply these techniques. Throughout this work, we link machine learning techniques to applied statistics. Machine learning tools and techniques can be applied to many addiction research problems and can improve addiction research.

Download Full-text

Forecasting and trading cryptocurrencies with machine learning under changing market conditions

Financial Innovation ◽

10.1186/s40854-020-00217-x ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Helder Sebastião ◽

Pedro Godinho

Keyword(s):

Machine Learning ◽

Linear Models ◽

Test Sample ◽

Trading Strategies ◽

Network Activity ◽

Machine Learning Techniques ◽

Support Vector ◽

Success Rates ◽

Market Conditions ◽

Sharpe Ratios

AbstractThis study examines the predictability of three major cryptocurrencies—bitcoin, ethereum, and litecoin—and the profitability of trading strategies devised upon machine learning techniques (e.g., linear models, random forests, and support vector machines). The models are validated in a period characterized by unprecedented turmoil and tested in a period of bear markets, allowing the assessment of whether the predictions are good even when the market direction changes between the validation and test periods. The classification and regression methods use attributes from trading and network activity for the period from August 15, 2015 to March 03, 2019, with the test sample beginning on April 13, 2018. For the test period, five out of 18 individual models have success rates of less than 50%. The trading strategies are built on model assembling. The ensemble assuming that five models produce identical signals (Ensemble 5) achieves the best performance for ethereum and litecoin, with annualized Sharpe ratios of 80.17% and 91.35% and annualized returns (after proportional round-trip trading costs of 0.5%) of 9.62% and 5.73%, respectively. These positive results support the claim that machine learning provides robust techniques for exploring the predictability of cryptocurrencies and for devising profitable trading strategies in these markets, even under adverse market conditions.

Download Full-text

Global and Local Clustering-Based Regression Models to Forecast Power Consumption in Buildings

Advances in Computer and Electrical Engineering - Handbook of Research on Emerging Technologies for Electrical Power Planning, Analysis, and Optimization ◽

10.4018/978-1-4666-9911-3.ch011 ◽

2016 ◽

pp. 207-234

Author(s):

Gonzalo Vergara ◽

Juan J. Carrasco ◽

Jesus Martínez-Gómez ◽

Manuel Domínguez ◽

José A. Gámez ◽

...

Keyword(s):

Machine Learning ◽

Power Consumption ◽

Electric Power ◽

Principal Component ◽

Real Data ◽

Machine Learning Techniques ◽

Electric Power Consumption ◽

Support Vector ◽

Learning Machines ◽

The University

The study of energy efficiency in buildings is an active field of research. Modeling and predicting energy related magnitudes leads to analyze electric power consumption and can achieve economical benefits. In this study, classical time series analysis and machine learning techniques, introducing clustering in some models, are applied to predict active power in buildings. The real data acquired corresponds to time, environmental and electrical data of 30 buildings belonging to the University of León (Spain). Firstly, we segmented buildings in terms of their energy consumption using principal component analysis. Afterwards, we applied state of the art machine learning methods and compare between them. Finally, we predicted daily electric power consumption profiles and compare them with actual data for different buildings. Our analysis shows that multilayer perceptrons have the lowest error followed by support vector regression and clustered extreme learning machines. We also analyze daily load profiles on weekdays and weekends for different buildings.

Download Full-text

Preparedness and Mitigation by projecting the risk against COVID-19 transmission using Machine Learning Techniques

10.1101/2020.04.26.20080655 ◽

2020 ◽

Author(s):

Akshay Kumar ◽

Farhan Mohammad Khan ◽

Rajiv Gupta ◽

Harish Puppala

Keyword(s):

Machine Learning ◽

Gaussian Process Regression ◽

Machine Learning Techniques ◽

World Health ◽

Support Vector ◽

Learning Tools ◽

Learning Techniques ◽

Health Organization ◽

Respiratory Coronavirus ◽

Criticality Index

AbstractThe outbreak of COVID-19 is first identified in China, which later spread to various parts of the globe and was pronounced pandemic by the World Health Organization (WHO). The disease of transmissible person-to-person pneumonia caused by the extreme acute respiratory coronavirus 2 syndrome (SARS-COV-2, also known as COVID-19), has sparked a global warning. Thermal screening, quarantining, and later lockdown were methods employed by various nations to contain the spread of the virus. Though exercising various possible plans to contain the spread help in mitigating the effect of COVID-19, projecting the rise and preparing to face the crisis would help in minimizing the effect. In the scenario, this study attempts to use Machine Learning tools to forecast the possible rise in the number of cases by considering the data of daily new cases. To capture the uncertainty, three different techniques: (i) Decision Tree algorithm, (ii) Support Vector Machine algorithm, and (iii) Gaussian process regression are used to project the data and capture the possible deviation. Based on the projection of new cases, recovered cases, deceased cases, medical facilities, population density, number of tests conducted, and facilities of services, are considered to define the criticality index (CI). CI is used to classify all the districts of the country in the regions of high risk, low risk, and moderate risk. An online dashpot is created, which updates the data on daily bases for the next four weeks. The prospective suggestions of this study would aid in planning the strategies to apply the lockdown/ any other plan for any country, which can take other parameters to define the CI.

Download Full-text

Predicting Forest Fires using Supervised and Ensemble Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2878.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 3697-3705 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Forest Fires ◽

Principal Component ◽

Climatic Conditions ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Physical Factors

Forest fires have become one of the most frequently occurring disasters in recent years. The effects of forest fires have a lasting impact on the environment as it lead to deforestation and global warming, which is also one of its major cause of occurrence. Forest fires are dealt by collecting the satellite images of forest and if there is any emergency caused by the fires then the authorities are notified to mitigate its effects. By the time the authorities get to know about it, the fires would have already caused a lot of damage. Data mining and machine learning techniques can provide an efficient prevention approach where data associated with forests can be used for predicting the eventuality of forest fires. This paper uses the dataset present in the UCI machine learning repository which consists of physical factors and climatic conditions of the Montesinho park situated in Portugal. Various algorithms like Logistic regression, Support Vector Machine, Random forest, K-Nearest neighbors in addition to Bagging and Boosting predictors are used, both with and without Principal Component Analysis (PCA). Among the models in which PCA was applied, Logistic Regression gave the highest F-1 score of 68.26 and among the models where PCA was absent, Gradient boosting gave the highest score of 68.36.

Download Full-text

Evaluation of Machine Learning Approaches for Automated Diagnosis of COVID-19 using X-Ray images (Preprint)

10.2196/preprints.18947 ◽

2020 ◽

Author(s):

Mazin Mohammed ◽

Karrar Hameed Abdulkareem ◽

Mashael S. Maashi ◽

Salama A. Mostafa A. Mostafa ◽

Abdullah Baz ◽

...

Keyword(s):

Machine Learning ◽

Computational Method ◽

Learning Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

X Ray ◽

Wide Range ◽

Artificial Neural Network Ann

BACKGROUND In most recent times, global concern has been caused by a coronavirus (COVID19), which is considered a global health threat due to its rapid spread across the globe. Machine learning (ML) is a computational method that can be used to automatically learn from experience and improve the accuracy of predictions. OBJECTIVE In this study, the use of machine learning has been applied to Coronavirus dataset of 50 X-ray images to enable the development of directions and detection modalities with risk causes.The dataset contains a wide range of samples of COVID-19 cases alongside SARS, MERS, and ARDS. The experiment was carried out using a total of 50 X-ray images, out of which 25 images were that of positive COVIDE-19 cases, while the other 25 were normal cases. METHODS An orange tool has been used for data manipulation. To be able to classify patients as carriers of Coronavirus and non-Coronavirus carriers, this tool has been employed in developing and analysing seven types of predictive models. Models such as , artificial neural network (ANN), support vector machine (SVM), linear kernel and radial basis function (RBF), k-nearest neighbour (k-NN), Decision Tree (DT), and CN2 rule inducer were used in this study.Furthermore, the standard InceptionV3 model has been used for feature extraction target. RESULTS The various machine learning techniques that have been trained on coronavirus disease 2019 (COVID-19) dataset with improved ML techniques parameters. The data set was divided into two parts, which are training and testing. The model was trained using 70% of the dataset, while the remaining 30% was used to test the model. The results show that the improved SVM achieved a F1 of 97% and an accuracy of 98%. CONCLUSIONS :. In this study, seven models have been developed to aid the detection of coronavirus. In such cases, the learning performance can be improved through knowledge transfer, whereby time-consuming data labelling efforts are not required.the evaluations of all the models are done in terms of different parameters. it can be concluded that all the models performed well, but the SVM demonstrated the best result for accuracy metric. Future work will compare classical approaches with deep learning ones and try to obtain better results. CLINICALTRIAL None

Download Full-text

LEARNING PROCESS BEHAVIOR FOR FAULT DETECTION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213011000450 ◽

2011 ◽

Vol 20 (05) ◽

pp. 969-980 ◽

Cited By ~ 2

Author(s):

CÁSSIO M. M. PEREIRA ◽

RODRIGO F. DE MELLO

Keyword(s):

Machine Learning ◽

Fault Detection ◽

Linear Models ◽

Moving Average ◽

Machine Learning Techniques ◽

Support Vector ◽

Self Healing ◽

Detection Techniques ◽

Process Behavior ◽

Changes Over Time

Recently, there has been an increased interest in self-healing systems. These types of systems are able to cope with failures in the environment they execute and work continuously by taking proactive actions to correct these problems. The detection of faults plays a prominent role in self-healing systems, as faults are the original causes of failures. Fault detection techniques proposed in the literature have been based on three mainstream approaches: process heartbeats, statistical analysis and machine learning. However, these approaches present limitations. Heartbeat-based techniques only detect failures, not faults. Statistical approaches generally assume linear models. Most machine learning techniques assume the data is independent and identically distributed. In order to overcome all these limitations we propose a new approach to address fault detection, which also gives insight into how process behavior changes over time in the presence of faults. Experiments show that the proposed approach achieves a twofold increase in F -measure when compared to Support Vector Machines (SVM) and Auto-Regressive Integrated Moving Average (ARIMA).

Download Full-text

Evaluation Of Machine Learning Tools For Distinguishing Fraud From Error

Journal of Business & Economics Research (JBER) ◽

10.19030/jber.v11i9.8067 ◽

2013 ◽

Vol 11 (9) ◽

pp. 393

Author(s):

Mei Zhang

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Classification Problem ◽

General Purpose ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Learning Tools ◽

Pure Chance ◽

Error Classification

<p>Fraud and error are two underlying sources of misstated financial statements. Modern machine learning techniques provide a potential direction to distinguish the two factors in such statements. In this paper, a thorough evaluation is conducted evaluation on how the off-the-shelf machine learning tools perform for fraud/error classification. In particular, the task is treated as a standard binary classification problem; i.e., mapping from an input vector of financial indices to a class label which is either error or fraud. With a real dataset of financial restatements, this study empirically evaluates and analyzes five state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines, decision trees, and bagging. There are several important observations from the experimental results. First, it is observed that bagging performs the best among these commonly used general purpose machine learning tools. Second, the results show that the underlying relationship from the statement indices to the fraud/error decision is likely to be non-linear. Third, it is very challenging to distinguish error from fraud, and general machine learning approaches, though perform better than pure chance, leave much room for improvement. The results suggest that more advanced or task-specific solutions are needed for fraud/error classification.</p>

Download Full-text

Price Movement Prediction of Cryptocurrencies Using Sentiment Analysis and Machine Learning

Entropy ◽

10.3390/e21060589 ◽

2019 ◽

Vol 21 (6) ◽

pp. 589 ◽

Cited By ~ 8

Author(s):

Franco Valencia ◽

Alfonso Gómez-Espinosa ◽

Benjamín Valdés-Aguirre

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Emerging Market ◽

Data Availability ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Tools ◽

Price Movement ◽

High Data ◽

Learning Techniques

Cryptocurrencies are becoming increasingly relevant in the financial world and can be considered as an emerging market. The low barrier of entry and high data availability of the cryptocurrency market makes it an excellent subject of study, from which it is possible to derive insights into the behavior of markets through the application of sentiment analysis and machine learning techniques for the challenging task of stock market prediction. While there have been some previous studies, most of them have focused exclusively on the behavior of Bitcoin. In this paper, we propose the usage of common machine learning tools and available social media data for predicting the price movement of the Bitcoin, Ethereum, Ripple and Litecoin cryptocurrency market movements. We compare the utilization of neural networks (NN), support vector machines (SVM) and random forest (RF) while using elements from Twitter and market data as input features. The results show that it is possible to predict cryptocurrency markets using machine learning and sentiment analysis, where Twitter data by itself could be used to predict certain cryptocurrencies and that NN outperform the other models.

Download Full-text

Groundwater Prediction Using Machine-Learning Tools

Algorithms ◽

10.3390/a13110300 ◽

2020 ◽

Vol 13 (11) ◽

pp. 300

Author(s):

Eslam A. Hussein ◽

Christopher Thron ◽

Mehrdad Ghaziasgar ◽

Antoine Bagula ◽

Mattia Vaccari

Keyword(s):

Machine Learning ◽

Support Vector Regression ◽

Gaussian Mixture ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Tools ◽

Global Features ◽

Groundwater Availability ◽

Extreme Gradient Boosting

Predicting groundwater availability is important to water sustainability and drought mitigation. Machine-learning tools have the potential to improve groundwater prediction, thus enabling resource planners to: (1) anticipate water quality in unsampled areas or depth zones; (2) design targeted monitoring programs; (3) inform groundwater protection strategies; and (4) evaluate the sustainability of groundwater sources of drinking water. This paper proposes a machine-learning approach to groundwater prediction with the following characteristics: (i) the use of a regression-based approach to predict full groundwater images based on sequences of monthly groundwater maps; (ii) strategic automatic feature selection (both local and global features) using extreme gradient boosting; and (iii) the use of a multiplicity of machine-learning techniques (extreme gradient boosting, multivariate linear regression, random forests, multilayer perceptron and support vector regression). Of these techniques, support vector regression consistently performed best in terms of minimizing root mean square error and mean absolute error. Furthermore, including a global feature obtained from a Gaussian Mixture Model produced models with lower error than the best which could be obtained with local geographical features.

Download Full-text

Nanoscale slip length prediction with machine learning tools

Scientific Reports ◽

10.1038/s41598-021-91885-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Filippos Sofos ◽

Theodoros E. Karakasidis

Keyword(s):

Machine Learning ◽

Linear Models ◽

Slip Length ◽

Learning Tools ◽

Simulation Data ◽

Regression Methods ◽

Wide Range ◽

Data Points ◽

Dynamics Simulations ◽

Simulation Parameters

AbstractThis work incorporates machine learning (ML) techniques, such as multivariate regression, the multi-layer perceptron, and random forest to predict the slip length at the nanoscale. Data points are collected both from our simulation data and data from the literature, and comprise Molecular Dynamics simulations of simple monoatomic, polar, and molecular liquids. Training and test points cover a wide range of input parameters which have been found to affect the slip length value, concerning dynamical and geometrical characteristics of the model, along with simulation parameters that constitute the simulation conditions. The aim of this work is to suggest an accurate and efficient procedure capable of reproducing physical properties, such as the slip length, acting parallel to simulation methods. Non-linear models, based on neural networks and decision trees, have been found to achieve better performance compared to linear regression methods. After the model is trained on representative simulation data, it is capable of accurately predicting the slip length values in regions between or in close proximity to the input data range, at the nanoscale. Results also reveal that, as channel dimensions increase, the slip length turns into a size-independent material property, affected mainly by wall roughness and wettability.

Download Full-text