Response to Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”

Jesús G. Estrada; Derek T. Ahneman; Robert P. Sheridan; Spencer D. Dreher; Abigail G. Doyle

doi:10.1126/science.aat8763

Response to Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”

Science ◽

10.1126/science.aat8763 ◽

2018 ◽

Vol 362 (6416) ◽

pp. eaat8763 ◽

Cited By ~ 13

Author(s):

Jesús G. Estrada ◽

Derek T. Ahneman ◽

Robert P. Sheridan ◽

Spencer D. Dreher ◽

Abigail G. Doyle

Keyword(s):

Machine Learning ◽

Cross Coupling ◽

Feature Model ◽

Learning Models ◽

Chemical Feature ◽

Out Of Sample ◽

Reaction Performance ◽

Machine Learning Models

We demonstrate that the chemical-feature model described in our original paper is distinguishable from the nongeneralizable models introduced by Chuang and Keiser. Furthermore, the chemical-feature model significantly outperforms these models in out-of-sample predictions, justifying the use of chemical featurization from which machine learning models can extract meaningful patterns in the dataset, as originally described.

Download Full-text

Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”

Science ◽

10.1126/science.aat8603 ◽

2018 ◽

Vol 362 (6416) ◽

pp. eaat8603 ◽

Cited By ~ 30

Author(s):

Kangway V. Chuang ◽

Michael J. Keiser

Keyword(s):

Machine Learning ◽

Experimental Design ◽

Coupling Reaction ◽

Cross Coupling ◽

Learning Models ◽

Applied Machine Learning ◽

Cross Coupling Reaction ◽

Reaction Performance ◽

Test Scenarios ◽

Machine Learning Models

Ahneman et al. (Reports, 13 April 2018) applied machine learning models to predict C–N cross-coupling reaction yields. The models use atomic, electronic, and vibrational descriptors as input features. However, the experimental design is insufficient to distinguish models trained on chemical features from those trained solely on random-valued features in retrospective and prospective test scenarios, thus failing classical controls in machine learning.

Download Full-text

Machine Learning in Futures Markets

Journal of Risk and Financial Management ◽

10.3390/jrfm14030119 ◽

2021 ◽

Vol 14 (3) ◽

pp. 119

Author(s):

Fabian Waldow ◽

Matthias Schnaubelt ◽

Christopher Krauss ◽

Thomas Günter Fischer

Keyword(s):

Machine Learning ◽

Futures Markets ◽

Learning Models ◽

Cross Sectional ◽

Data Set ◽

Statistical Arbitrage ◽

Out Of Sample ◽

Sample Testing ◽

Arbitrage Strategy ◽

Machine Learning Models

In this paper, we demonstrate how a well-established machine learning-based statistical arbitrage strategy can be successfully transferred from equity to futures markets. First, we preprocess futures time series comprised of front months to render them suitable for our returns-based trading framework and compile a data set comprised of 60 futures covering nearly 10 trading years. Next, we train several machine learning models to predict whether the h-day-ahead return of each future out- or underperforms the corresponding cross-sectional median return. Finally, we enter long/short positions for the top/flop-k futures for a duration of h days and assess the financial performance of the resulting portfolio in an out-of-sample testing period. Thereby, we find the machine learning models to yield statistically significant out-of-sample break-even transaction costs of 6.3 bp—a clear challenge to the semi-strong form of market efficiency. Finally, we discuss sources of profitability and the robustness of our findings.

Download Full-text

A Machine Learning Integrated Portfolio Rebalance Framework with Risk-Aversion Adjustment

Journal of Risk and Financial Management ◽

10.3390/jrfm13070155 ◽

2020 ◽

Vol 13 (7) ◽

pp. 155

Author(s):

Zhenlong Jiang ◽

Ran Ji ◽

Kuo-Chu Chang

Keyword(s):

Machine Learning ◽

Risk Aversion ◽

Financial Data ◽

Rolling Horizon ◽

Learning Models ◽

Market Index ◽

Out Of Sample ◽

The Mean ◽

Mean Risk ◽

Machine Learning Models

We propose a portfolio rebalance framework that integrates machine learning models into the mean-risk portfolios in multi-period settings with risk-aversion adjustment. In each period, the risk-aversion coefficient is adjusted automatically according to market trend movements predicted by machine learning models. We employ Gini’s Mean Difference (GMD) to specify the risk of a portfolio and use a set of technical indicators generated from a market index (e.g., S&P 500 index) to feed the machine learning models to predict market movements. Using a rolling-horizon approach, we conduct a series of computational tests with real financial data to evaluate the performance of the machine learning integrated portfolio rebalance framework. The empirical results show that the XGBoost model provides the best prediction of market movement, while the proposed portfolio rebalance strategy generates portfolios with superior out-of-sample performances in terms of average returns, time-series cumulative returns, and annualized returns compared to the benchmarks.

Download Full-text

Cross-validation and out-of-sample testing of physical activity intensity predictions with a wrist-worn accelerometer

Journal of Applied Physiology ◽

10.1152/japplphysiol.00760.2017 ◽

2018 ◽

Vol 124 (5) ◽

pp. 1284-1293 ◽

Cited By ~ 9

Author(s):

Alexander H. K. Montoye ◽

Bradford S. Westgate ◽

Morgan R. Fonley ◽

Karin A. Pfeiffer

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Cross Validation ◽

Learning Models ◽

Data Set ◽

Feature Sets ◽

Activity Intensity ◽

Out Of Sample ◽

Sample Testing ◽

Machine Learning Models

Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected data sets [Ball State University (BSU) and Michigan State University (MSU)] in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus 2 combination methods) and six feature sets were used to predict activity intensity (30-s intervals) with the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3–78.5%) had higher accuracy than other machine learning models (70.9–76.4%) and accuracy similar to combination methods (77.0–77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6–92.8% vs. 87.8–91.9% in MSU data set; 79.3–80.2% vs. 76.7–78.4% in BSU data set) but similar or worse accuracy in out-of-sample testing (74.0–77.4% vs. 74.1–79.1% in MSU data set; 76.1–77.0% vs. 75.5–77.3% in BSU data set). All machine learning models outperformed the euclidean norm minus one/GGIR method in out-of-sample testing (69.5–78.5% vs. 53.6–70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer. NEW & NOTEWORTHY This study includes in-sample and out-of-sample cross-validation of an alternate method for deriving meaningful physical activity outcomes from accelerometer data collected with a wrist-worn accelerometer. This method uses machine learning to directly predict activity intensity. By so doing, this study provides a classification model that may avoid high errors present with energy expenditure prediction while still allowing researchers to assess adherence to physical activity guidelines.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Experimental Comparison of Machine Learning Models in Malware Packing Detection

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237007 ◽

2020 ◽

Author(s):

Jong-Wouk Kim ◽

Juhong Namgung ◽

Yang-Sae Moon ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Experimental Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text