Oversampling Free Energy Perturbation Simulation in Determination of the Ligand‐Binding Free Energy

2019 ◽  
Vol 41 (7) ◽  
pp. 611-618 ◽  
Author(s):  
Son Tung Ngo ◽  
Trung Hai Nguyen ◽  
Nguyen Thanh Tung ◽  
Pham Cam Nam ◽  
Khanh B. Vu ◽  
...  
2019 ◽  
Author(s):  
Qingyi Yang ◽  
Woodrow W. Burchett ◽  
Gregory S. Steeno ◽  
David L. Mobley ◽  
Xinjun Hou

Predicting binding free energy of ligand-protein complexes has been a grand challenge in the field of computational chemistry since the early days of molecular modeling. Multiple computational methodologies exist to predict ligand binding affinities. Pathway-based Free Energy Perturbation (FEP), Thermodynamic Integration (TI) as well as Linear Interaction Energy (LIE), and Molecular Mechanics-Poisson Boltzmann/Generalized Born Surface Area (MM-PBSA/GBSA) have been applied to a variety of biologically relevant problems and achieved different levels of predictive accuracy. Recent advancements in computer hardware and simulation algorithms of molecular dynamics and Monte Carlo sampling, as well as improved general force field parameters, have made FEP a principal approach for calculating the free energy differences, especially when calculating the host-guest binding affinity differences upon chemical modification.<br><br>Since the FEP-calculated binding free energy difference, denoted ddGFEP only characterizes the difference in free energy between pairs of ligands or complexes, not the absolute binding free energy value of each individual host-guest system, denoted dG, we examine two rarely asked questions in FEP application:<br><br>1) Which values would be more appropriate as the prediction to assess the ligands prospectively: the calculated pairwise free energy differences, ddGFEP, or the estimated absolute binding energies, d^G, transformed from ddGFEP?<br>2) In the situation where only a limited number of ligand pairs can be calculated in FEP, can the perturbation pairs be optimally selected with respect to the reference ligand(s) to maximize the prediction precision?<br><br>These two questions underline the viability of an often-neglected assumption in pairwise comparisons: that the pairwise value is sufficient to make a quantitative and reliable characterization of an individual ligand's properties or activities. This implicit assumption would be true if there was no error in each pairwise calculation. Recently pair designs such as multiple pathways or cycle closure analyses provided calculation error estimation but did not address the statistical impact of the two questions above. The error impact is fully minimized by conducting an exhaustive study that obtains all NC2 = N(N-1)/2 pairs for a set N molecules; more if there is directionality (dGi,j != dGj,i). Obviously, that study design is impractical and unnecessary. Thus, we desire to collect the right amount of data that is 1) feasibly attainable, 2) topologically sufficient, and 3) mathematically synthesizable so that we can mitigate inherent calculation errors and have higher confidence in our conclusions.<br><br>The significance of above questions can be illustrated by a motivating example shown in Figure 1 and Table 1, which considers two different perturbation graph designs for 20 ligands with the same number of FEP perturbation pairs, 19, and the same reference, Ligand 1. These two designs reached different conclusions in rank ordering ligand potencies due to errors inherent in the FEP derived estimates. Based on design A, ligands 5, 7, 14, 15 would be selected as the best four (20%) picks since those d^G estimates are the most favorable. Design B would yield ligands 5, 12, 18, 19 as best for the same reason. Without knowing the true value, dGTrue of the other 19 ligands, we lack a prospective metric to assess which design could be more precise even though, retrospectively, we know that both designs had reasonably good agreement with the true values, as measured through correlation and error metrics. However, the top picks from neither design were consistent with the true top four ligands, which are ligands 7, 10, 12, 18. Yet, if all of the 20C2 =190 pairs could have been calculated as listed in the last column of Table 1, the best four ligands would have been correctly identified. Additionally, the other metrics included in Table 1 were significantly improved. However, as mentioned above, calculating all possible pairs, or even a significant fraction of all possible pairs, is unlikely in practice, especially when number of molecules are large. Given this restriction, is it possible to objectively determine whether design A or B will give more precise predictions?<br><br>In this report, we investigated the performance of the calculated ddGFEP values compared to the pairwise differences in least squares derived d^G estimates both analytically and through simulations. Based on our findings, we recommend applying weighted least squares to transforming ddGFEP values into d^G estimates. Second, we investigated the factors that contribute to the precision of the d^G estimates, such as the total number of computed pairs, the selection of computed pairs, and the uncertainty in the computed ddGFEP values. The mean squared error, denoted MSE and Spearman's rank correlation, are used as performance metrics.<br><br>To illustrate, we demonstrated how the structural similarity can be included in design and its potential impact on prediction precision. As in the majority of reported FEP studies on binding affinity prediction, the ddGFEP pairs were selected based on chemical structure similarity. Pairs with small chemical differences are assumed to be more likely to have smaller errors in ddGFEP calculation. Together using the constructed mathematic system and literature examples, we demonstrate that some of pair-selection schemes (designs) are better than the others. To minimize the prediction uncertainty, it is recommended to wisely select design optimality criterion to suit<br>practical applications accordingly.<br>


2020 ◽  
Vol 60 (11) ◽  
pp. 5563-5579 ◽  
Author(s):  
Francesca Deflorian ◽  
Laura Perez-Benito ◽  
Eelke B Lenselink ◽  
Miles Congreve ◽  
Herman W. T. van Vlijmen ◽  
...  

2021 ◽  
Author(s):  
Alexander Wade ◽  
Agastya Bhati ◽  
Shunzhou Wan ◽  
Peter Coveney

The binding free energy between a ligand and its target protein is an essential quantity to know at all stages of the drug discovery pipeline. Assessing this value computationally can offer insight into where efforts should be focused in the pursuit of effective therapeutics to treat myriad diseases. In this work we examine the computation of alchemical relative binding free energies with an eye to assessing reproducibility across popular molecular dynamics packages and free energy estimators. The focus of this work is on 54 ligand transformations from a diverse set of protein targets: MCL1, PTP1B, TYK2, CDK2 and thrombin. These targets are studied with three popular molecular dynamics packages: OpenMM, NAMD2 and NAMD3. Trajectories collected with these packages are used to compare relative binding free energies calculated with thermodynamic integration and free energy perturbation methods. The resulting binding free energies show good agreement between molecular dynamics packages with an average mean unsigned error between packages of 0.5 $kcal/mol$ The correlation between packages is very good with the lowest Spearman's, Pearson's and Kendall's tau correlation coefficient between two packages being 0.91, 0.89 and 0.74 respectively. Agreement between thermodynamic integration and free energy perturbation is shown to be very good when using ensemble averaging.


2019 ◽  
Author(s):  
Qingyi Yang ◽  
Woodrow W. Burchett ◽  
Gregory S. Steeno ◽  
David L. Mobley ◽  
Xinjun Hou

Predicting binding free energy of ligand-protein complexes has been a grand challenge in the field of computational chemistry since the early days of molecular modeling. Multiple computational methodologies exist to predict ligand binding affinities. Pathway-based Free Energy Perturbation (FEP), Thermodynamic Integration (TI) as well as Linear Interaction Energy (LIE), and Molecular Mechanics-Poisson Boltzmann/Generalized Born Surface Area (MM-PBSA/GBSA) have been applied to a variety of biologically relevant problems and achieved different levels of predictive accuracy. Recent advancements in computer hardware and simulation algorithms of molecular dynamics and Monte Carlo sampling, as well as improved general force field parameters, have made FEP a principal approach for calculating the free energy differences, especially when calculating the host-guest binding affinity differences upon chemical modification.<br><br>Since the FEP-calculated binding free energy difference, denoted ddGFEP only characterizes the difference in free energy between pairs of ligands or complexes, not the absolute binding free energy value of each individual host-guest system, denoted dG, we examine two rarely asked questions in FEP application:<br><br>1) Which values would be more appropriate as the prediction to assess the ligands prospectively: the calculated pairwise free energy differences, ddGFEP, or the estimated absolute binding energies, d^G, transformed from ddGFEP?<br>2) In the situation where only a limited number of ligand pairs can be calculated in FEP, can the perturbation pairs be optimally selected with respect to the reference ligand(s) to maximize the prediction precision?<br><br>These two questions underline the viability of an often-neglected assumption in pairwise comparisons: that the pairwise value is sufficient to make a quantitative and reliable characterization of an individual ligand's properties or activities. This implicit assumption would be true if there was no error in each pairwise calculation. Recently pair designs such as multiple pathways or cycle closure analyses provided calculation error estimation but did not address the statistical impact of the two questions above. The error impact is fully minimized by conducting an exhaustive study that obtains all NC2 = N(N-1)/2 pairs for a set N molecules; more if there is directionality (dGi,j != dGj,i). Obviously, that study design is impractical and unnecessary. Thus, we desire to collect the right amount of data that is 1) feasibly attainable, 2) topologically sufficient, and 3) mathematically synthesizable so that we can mitigate inherent calculation errors and have higher confidence in our conclusions.<br><br>The significance of above questions can be illustrated by a motivating example shown in Figure 1 and Table 1, which considers two different perturbation graph designs for 20 ligands with the same number of FEP perturbation pairs, 19, and the same reference, Ligand 1. These two designs reached different conclusions in rank ordering ligand potencies due to errors inherent in the FEP derived estimates. Based on design A, ligands 5, 7, 14, 15 would be selected as the best four (20%) picks since those d^G estimates are the most favorable. Design B would yield ligands 5, 12, 18, 19 as best for the same reason. Without knowing the true value, dGTrue of the other 19 ligands, we lack a prospective metric to assess which design could be more precise even though, retrospectively, we know that both designs had reasonably good agreement with the true values, as measured through correlation and error metrics. However, the top picks from neither design were consistent with the true top four ligands, which are ligands 7, 10, 12, 18. Yet, if all of the 20C2 =190 pairs could have been calculated as listed in the last column of Table 1, the best four ligands would have been correctly identified. Additionally, the other metrics included in Table 1 were significantly improved. However, as mentioned above, calculating all possible pairs, or even a significant fraction of all possible pairs, is unlikely in practice, especially when number of molecules are large. Given this restriction, is it possible to objectively determine whether design A or B will give more precise predictions?<br><br>In this report, we investigated the performance of the calculated ddGFEP values compared to the pairwise differences in least squares derived d^G estimates both analytically and through simulations. Based on our findings, we recommend applying weighted least squares to transforming ddGFEP values into d^G estimates. Second, we investigated the factors that contribute to the precision of the d^G estimates, such as the total number of computed pairs, the selection of computed pairs, and the uncertainty in the computed ddGFEP values. The mean squared error, denoted MSE and Spearman's rank correlation, are used as performance metrics.<br><br>To illustrate, we demonstrated how the structural similarity can be included in design and its potential impact on prediction precision. As in the majority of reported FEP studies on binding affinity prediction, the ddGFEP pairs were selected based on chemical structure similarity. Pairs with small chemical differences are assumed to be more likely to have smaller errors in ddGFEP calculation. Together using the constructed mathematic system and literature examples, we demonstrate that some of pair-selection schemes (designs) are better than the others. To minimize the prediction uncertainty, it is recommended to wisely select design optimality criterion to suit<br>practical applications accordingly.<br>


2020 ◽  
Author(s):  
Son Tung Ngo ◽  
Nguyen Minh Tam ◽  
Pham Minh Quan ◽  
Trung Hai Nguyen

COVID-19 pandemic has killed millions of people worldwide since its outbreak in Dec 2019. The pandemic is caused by the SARS-CoV-2 virus whose main protease (Mpro) is a promising drug target since it plays a key role in viral proliferation and replication. Currently, designing an effective therapy is an urgent task, which requires accurately estimating ligand-binding free energy to the SARS-CoV-2 Mpro. However, it should be noted that the accuracy of a free energy method probably depends on the protein target. A highly accurate approach for some targets may fail to produce a reasonable correlation with experiment when a novel enzyme is considered as a drug target. Therefore, in this context, the ligand-binding affinity to SARS-CoV-2 Mpro was calculated via various approaches. The Autodock Vina (Vina) and Autodock4 (AD4) packages were manipulated to preliminary investigate the ligand-binding affinity and pose to the SARS-CoV-2 Mpro. The binding free energy was then refined using the fast pulling of ligand (FPL), linear interaction energy (LIE), molecular mechanics-Poission Boltzmann surface area (MM-PBSA), and free energy perturbation (FEP) methods. The benchmark results indicated that for docking calculations, Vina is more accurate than AD4 and for free energy methods, FEP is the most accurate followed by LIE, FPL and MM-PBSA (FEP > LIE > FPL > MM-PBSA). Moreover, the binding mechanism was also revealed by atomistic simulations. The vdW interaction is the dominant factor. The residues <i>Thr25</i>, <i>Thr26</i>, <i>His41</i>, <i>Ser46</i>, <i>Asn142</i>, <i>Gly143</i>, <i>Cys145</i>, <i>Glu166</i>, and <i>Gln189</i> are essential elements affecting on the binding process. Furthermore, the <i>Ser46</i> and related residues probably are important elements affecting the enlarge/dwindle of the SARS-CoV-2 Mpro binding cleft. The benchmark probably guide for further investigations using computational approaches.


2020 ◽  
Vol 117 (44) ◽  
pp. 27381-27387 ◽  
Author(s):  
Zhe Li ◽  
Xin Li ◽  
Yi-You Huang ◽  
Yaoxing Wu ◽  
Runduo Liu ◽  
...  

The COVID-19 pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has become a global crisis. There is no therapeutic treatment specific for COVID-19. It is highly desirable to identify potential antiviral agents against SARS-CoV-2 from existing drugs available for other diseases and thus repurpose them for treatment of COVID-19. In general, a drug repurposing effort for treatment of a new disease, such as COVID-19, usually starts from a virtual screening of existing drugs, followed by experimental validation, but the actual hit rate is generally rather low with traditional computational methods. Here we report a virtual screening approach with accelerated free energy perturbation-based absolute binding free energy (FEP-ABFE) predictions and its use in identifying drugs targeting SARS-CoV-2 main protease (Mpro). The accurate FEP-ABFE predictions were based on the use of a restraint energy distribution (RED) function, making the practical FEP-ABFE−based virtual screening of the existing drug library possible. As a result, out of 25 drugs predicted, 15 were confirmed as potent inhibitors of SARS-CoV-2 Mpro. The most potent one is dipyridamole (inhibitory constant Ki= 0.04 µM) which has shown promising therapeutic effects in subsequently conducted clinical studies for treatment of patients with COVID-19. Additionally, hydroxychloroquine (Ki= 0.36 µM) and chloroquine (Ki= 0.56 µM) were also found to potently inhibit SARS-CoV-2 Mpro. We anticipate that the FEP-ABFE prediction-based virtual screening approach will be useful in many other drug repurposing or discovery efforts.


Sign in / Sign up

Export Citation Format

Share Document