Augmenting Basin-Hopping With Techniques From Unsupervised Machine Learning: Applications in Spectroscopy and Ion Mobility

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Analysis of the Bath Motion in the MM-SQC Dynamics Using Unsupervised Machine Learning Dimensionality Reduction Approaches: Principal Component Analysis

10.26434/chemrxiv.13332530 ◽

2020 ◽

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Collective Motion ◽

Principal Component ◽

Component Analysis ◽

Nonadiabatic Dynamics ◽

Trajectory Data ◽

Unsupervised Machine Learning ◽

Physical Knowledge ◽

Vibronic Couplings

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.

Download Full-text

Exploring the Applications of Machine Learning in Healthcare

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910666191220103417 ◽

2020 ◽

Vol 10 (4) ◽

pp. 458-472

Author(s):

Tausifa Jan Saleem ◽

Mohammad Ahsan Chishti

Keyword(s):

Machine Learning ◽

Disease Risk ◽

Disease Diagnosis ◽

Machine Intelligence ◽

Healthcare Applications ◽

Comprehensive Overview ◽

Machine Learning Applications ◽

Remote Healthcare ◽

Healthcare Monitoring ◽

Applications Of Machine Learning

The rapid progress in domains like machine learning, and big data has created plenty of opportunities in data-driven applications particularly healthcare. Incorporating machine intelligence in healthcare can result in breakthroughs like precise disease diagnosis, novel methods of treatment, remote healthcare monitoring, drug discovery, and curtailment in healthcare costs. The implementation of machine intelligence algorithms on the massive healthcare datasets is computationally expensive. However, consequential progress in computational power during recent years has facilitated the deployment of machine intelligence algorithms in healthcare applications. Motivated to explore these applications, this paper presents a review of research works dedicated to the implementation of machine learning on healthcare datasets. The studies that were conducted have been categorized into following groups (a) disease diagnosis and detection, (b) disease risk prediction, (c) health monitoring, (d) healthcare related discoveries, and (e) epidemic outbreak prediction. The objective of the research is to help the researchers in this field to get a comprehensive overview of the machine learning applications in healthcare. Apart from revealing the potential of machine learning in healthcare, this paper will serve as a motivation to foster advanced research in the domain of machine intelligence-driven healthcare.

Download Full-text

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

Machine Learning and Knowledge Extraction ◽

10.3390/make3020020 ◽

2021 ◽

Vol 3 (2) ◽

pp. 392-413

Author(s):

Stefan Studer ◽

Thanh Binh Bui ◽

Christian Drescher ◽

Alexander Hanuschkin ◽

Ludwig Winkler ◽

...

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Process Model ◽

Practical Experience ◽

Special Focus ◽

Close Monitoring ◽

Machine Learning Applications ◽

Project Organizations ◽

Considerable Impact ◽

Learning Development

Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.

Download Full-text

How Do Machines Learn? Artificial Intelligence as a New Era in Medicine

Journal of Personalized Medicine ◽

10.3390/jpm11010032 ◽

2021 ◽

Vol 11 (1) ◽

pp. 32

Author(s):

Oliwia Koteluk ◽

Adrian Wartecki ◽

Sylwia Mazurek ◽

Iga Kołodziejczak ◽

Andrzej Mackiewicz

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Health Care ◽

Medical Data ◽

General Process ◽

New Era ◽

Automated Evaluation ◽

Machine Learning Applications ◽

And Training ◽

Current Standards

With an increased number of medical data generated every day, there is a strong need for reliable, automated evaluation tools. With high hopes and expectations, machine learning has the potential to revolutionize many fields of medicine, helping to make faster and more correct decisions and improving current standards of treatment. Today, machines can analyze, learn, communicate, and understand processed data and are used in health care increasingly. This review explains different models and the general process of machine learning and training the algorithms. Furthermore, it summarizes the most useful machine learning applications and tools in different branches of medicine and health care (radiology, pathology, pharmacology, infectious diseases, personalized decision making, and many others). The review also addresses the futuristic prospects and threats of applying artificial intelligence as an advanced, automated medicine tool.

Download Full-text

An unsupervised machine-learning checkpoint-restart algorithm using Gaussian mixtures for particle-in-cell simulations

Journal of Computational Physics ◽

10.1016/j.jcp.2021.110185 ◽

2021 ◽

Vol 436 ◽

pp. 110185

Author(s):

G. Chen ◽

L. Chacón ◽

T.B. Nguyen

Keyword(s):

Machine Learning ◽

Gaussian Mixtures ◽

Unsupervised Machine Learning ◽

Particle In Cell

Download Full-text