Machine Learning Boosted Docking (HASTEN): An Open‐source Tool To Accelerate Structure‐based Virtual Screening Campaigns

2021 ◽  
Author(s):  
Tuomo Kalliokoski
2021 ◽  
Author(s):  
Tuomo Kalliokoski

The software macHine leArning booSTed dockiNg (HASTEN) was developed to accelerate structure-based virtual screening using machine learning models. It has been validated using datasets both from literature (12 datasets, each containing three million molecules docked with FRED) and in-house sources (one dataset of four million compounds docked with Glide). HASTEN showed reasonable performance by having the mean recall value of 0.78 of the top one percent scoring molecules after docking 10 % of the dataset for the literature data, whereas excellent recall value of 0.95 was achieved for the in-house data. The program can be used with any docking- and machine learning methodology, and is freely available from<br>https://github.com/TuomoKalliokoski/HASTEN.


2021 ◽  
Author(s):  
Tuomo Kalliokoski

The software macHine leArning booSTed dockiNg (HASTEN) was developed to accelerate<br>structure-based virtual screening using machine learning models. It has been validated using<br>datasets both from literature (12 datasets, each containing three million molecules docked<br>with FRED) and in-house sources (one dataset of four million compounds docked with<br>Glide). HASTEN showed reasonable performance by having the mean recall value of 0.78 of<br>the top one percent scoring molecules after docking 10 % of the dataset for the literature data,<br>whereas excellent recall value of 0.95 was achieved for the in-house data. The program can be<br>used with any docking- and machine learning methodology, and is freely available from<br>https://github.com/TuomoKalliokoski/HASTEN.


2021 ◽  
Author(s):  
Tuomo Kalliokoski

The software macHine leArning booSTed dockiNg (HASTEN) was developed to accelerate<br>structure-based virtual screening using machine learning models. It has been validated using<br>datasets both from literature (12 datasets, each containing three million molecules docked<br>with FRED) and in-house sources (one dataset of four million compounds docked with<br>Glide). HASTEN showed reasonable performance by having the mean recall value of 0.78 of<br>the top one percent scoring molecules after docking 10 % of the dataset for the literature data,<br>whereas excellent recall value of 0.95 was achieved for the in-house data. The program can be<br>used with any docking- and machine learning methodology, and is freely available from<br>https://github.com/TuomoKalliokoski/HASTEN.


10.29173/iq18 ◽  
2017 ◽  
Vol 42 (1) ◽  
pp. 14
Author(s):  
Vicky Steeves ◽  
Rémi Rampin ◽  
Fernando Chirigati

Achieving research reproducibility is challenging in many ways: there are social and cultural obstacles as well as a constantly changing technical landscape that makes replicating and reproducing research difficult. Users face challenges in reproducing research across different operating systems, in using different versions of software across long projects and among collaborations, and in using publicly available work. The dependencies required to reproduce the computational environments in which research happens can be exceptionally hard to track – in many cases, these dependencies are hidden or nested too deeply to discover, and thus impossible to install on a new machine, which means adoption remains low. In this paper, we present ReproZip , an open source tool to help overcome the technical difficulties involved in preserving and replicating research, applications, databases, software, and more. We will examine the current use cases of ReproZip , ranging from digital humanities to machine learning. We also explore potential library use cases for ReproZip, particularly in digital libraries and archives, liaison librarianship, and other library services. We believe that libraries and archives can leverage ReproZip to deliver more robust reproducibility services, repository services, as well as enhanced discoverability and preservation of research materials, applications, software, and computational environments.


PLoS ONE ◽  
2018 ◽  
Vol 13 (7) ◽  
pp. e0199589 ◽  
Author(s):  
Michael S. Smirnov ◽  
Tavita R. Garrett ◽  
Ryohei Yasuda

2017 ◽  
Author(s):  
Vicky Steeves ◽  
Remi Rampin ◽  
Fernando Chirigati

This is a pre-print of a manuscript pending publication. Achieving research reproducibility is challenging in many ways: there are social and cultural obstacles as well as a constantly changing technical landscape that makes replicating and reproducing research difficult. Users face challenges in reproducing research across different operating systems, in using different versions of software across long projects and among collaborations, and in using publicly available work. The dependencies required to reproduce the computational environments in which research happens can be exceptionally hard to track – in many cases, these dependencies are hidden or nested too deeply to discover, and thus impossible to install on a new machine, which means adoption remains low. In this paper, we present ReproZip, an open source tool to help overcome the technical difficulties involved in preserving and replicating research, applications, databases, software, and more. We examine the current use cases of ReproZip, ranging from digital humanities to machine learning. We also explore potential library use cases for ReproZip, particularly in digital libraries and archives, liaison librarianship, and other library services. We believe that libraries and archives can leverage ReproZip to deliver more robust reproducibility services, repository services, as well as enhanced discoverability and preservation of research materials, applications, software, and computational environments.


2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


2018 ◽  
Vol 15 (1) ◽  
pp. 6-28 ◽  
Author(s):  
Javier Pérez-Sianes ◽  
Horacio Pérez-Sánchez ◽  
Fernando Díaz

Background: Automated compound testing is currently the de facto standard method for drug screening, but it has not brought the great increase in the number of new drugs that was expected. Computer- aided compounds search, known as Virtual Screening, has shown the benefits to this field as a complement or even alternative to the robotic drug discovery. There are different methods and approaches to address this problem and most of them are often included in one of the main screening strategies. Machine learning, however, has established itself as a virtual screening methodology in its own right and it may grow in popularity with the new trends on artificial intelligence. Objective: This paper will attempt to provide a comprehensive and structured review that collects the most important proposals made so far in this area of research. Particular attention is given to some recent developments carried out in the machine learning field: the deep learning approach, which is pointed out as a future key player in the virtual screening landscape.


2016 ◽  
Vol 11 (4) ◽  
pp. 408-420 ◽  
Author(s):  
Cândida G. Silva ◽  
Carlos J.V. Simoes ◽  
Pedro Carreiras ◽  
Rui M.M. Brito

Sign in / Sign up

Export Citation Format

Share Document