An overview of data-driven HADDOCK strategies in CAPRI rounds 38-45

Mapping Intimacies ◽

10.1101/718122 ◽

2019 ◽

Author(s):

P.I. Koukos ◽

J. Roel-Touris ◽

F. Ambrosetti ◽

C. Geng ◽

J. Schaarschmidt ◽

...

Keyword(s):

Scoring Function ◽

Data Driven ◽

Coarse Grained ◽

Conformational Sampling ◽

Model Quality ◽

Sustained Performance ◽

Docking Approach ◽

Binding Pockets ◽

Integrate Data ◽

Modelling Process

ABSTRACTOur information-driven docking approach HADDOCK has demonstrated a sustained performance since the start of its participation to CAPRI. This is due, in part, to its ability to integrate data into the modelling process, and to the robustness of its scoring function. We participated in CAPRI both as server and as manual predictors.In CAPRI rounds 38-45, we have used various strategies depending on the information at hand. These ranged from imposing restraints to a few residues identified from literature as being important for the interaction, to binding pockets identified from homologous complexes or template-based refinement / CA-CA restraint-guided docking from identified templates. When relevant, symmetry restraints were used to limit the conformational sampling. We also tested for a large decamer target a new implementation of the MARTINI coarse-grained force field in HADDOCK. Overall in the current rounds, we obtained acceptable or better predictions for 13 and 11 server and manual submissions, respectively, out of the 22 interfaces. Our server performance (acceptable models) was better (59%) than the manual (50%) one, in which we typically experiment with various combinations of protocols and data sources. Again, our simple scoring function based on a linear combination of intermolecular van der Waals and electrostatic energies and an empirical desolvation term demonstrated a good performance in the scoring experiment with a 63% success rate across all 22 interfaces.An analysis of model quality indicates that, while we are consistently performing well in generating acceptable models, there is room for improvement for generating/identifying higher quality models.

Download Full-text

Data-driven multiscale modeling reveals the role of metabolic coupling for the spatio-temporal growth dynamics of yeast colonies

10.1101/344226 ◽

2018 ◽

Author(s):

Jukka Intosalmi ◽

Adrian C. Scott ◽

Michelle Hays ◽

Nicholas Flann ◽

Olli Yli-Harja ◽

...

Keyword(s):

Experimental Data ◽

Time Course ◽

Model Development ◽

Growth Dynamics ◽

Colony Growth ◽

Data Driven ◽

Coarse Grained ◽

Intercellular Signaling ◽

Growth Data ◽

Metabolic Coupling

AbstractMotivationMulticellular entities, such as mammalian tissues or microbial biofilms, typically exhibit complex spatial arrangements that are adapted to their specific functions or environments. These structures result from intercellular signaling as well as from the interaction with the environment that allow cells of the same genotype to differentiate into well-organized communities of diversified cells. Despite its importance, our understanding on how cell–cell and metabolic coupling produce functionally optimized structures is still limited.ResultsHere, we present a data-driven spatial framework to computationally investigate the development of one multicellular structure, yeast colonies. Using experimental growth data from homogeneous liquid media conditions, we develop and parameterize a dynamic cell state and growth model. We then use the resulting model in a coarse-grained spatial model, which we calibrate using experimental time-course data of colony growth. Throughout the model development process, we use state-of-the-art statistical techniques to handle the uncertainty of model structure and parameterization. Further, we validate the model predictions against independent experimental data and illustrate how metabolic coupling plays a central role in colony formation.AvailabilityExperimental data and a computational implementation to reproduce the results are available athttp://research.cs.aalto.fi/csb/software/multiscale/[email protected],[email protected]

Download Full-text

Data-driven multiscale modeling reveals the role of metabolic coupling for the spatio-temporal growth dynamics of yeast colonies

BMC Molecular and Cell Biology ◽

10.1186/s12860-019-0234-z ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Jukka Intosalmi ◽

Adrian C. Scott ◽

Michelle Hays ◽

Nicholas Flann ◽

Olli Yli-Harja ◽

...

Keyword(s):

Experimental Data ◽

Time Course ◽

Growth Dynamics ◽

Colony Growth ◽

Data Driven ◽

Coarse Grained ◽

Colony Development ◽

Intercellular Signaling ◽

Metabolic Coupling ◽

The Impact

Abstract Background Multicellular entities like mammalian tissues or microbial biofilms typically exhibit complex spatial arrangements that are adapted to their specific functions or environments. These structures result from intercellular signaling as well as from the interaction with the environment that allow cells of the same genotype to differentiate into well-organized communities of diversified cells. Despite its importance, our understanding how this cell–cell and metabolic coupling lead to functionally optimized structures is still limited. Results Here, we present a data-driven spatial framework to computationally investigate the development of yeast colonies as such a multicellular structure in dependence on metabolic capacity. For this purpose, we first developed and parameterized a dynamic cell state and growth model for yeast based on on experimental data from homogeneous liquid media conditions. The inferred model is subsequently used in a spatially coarse-grained model for colony development to investigate the effect of metabolic coupling by calibrating spatial parameters from experimental time-course data of colony growth using state-of-the-art statistical techniques for model uncertainty and parameter estimations. The model is finally validated by independent experimental data of an alternative yeast strain with distinct metabolic characteristics and illustrates the impact of metabolic coupling for structure formation. Conclusions We introduce a novel model for yeast colony formation, present a statistical methodology for model calibration in a data-driven manner, and demonstrate how the established model can be used to generate predictions across scales by validation against independent measurements of genetically distinct yeast strains.

Download Full-text

Data-driven coarse-grained modeling of polymers in solution with structural and dynamic properties conserved

Soft Matter ◽

10.1039/d0sm01019g ◽

2020 ◽

Vol 16 (36) ◽

pp. 8330-8344

Author(s):

Shu Wang ◽

Zhan Ma ◽

Wenxiao Pan

Keyword(s):

Structural Properties ◽

Dynamic Properties ◽

Data Driven ◽

Coarse Grained ◽

System P

We present data-driven coarse-grained (CG) modeling for polymers in solution, which conserves the dynamic as well as structural properties of the underlying atomistic system.

Download Full-text

Feature Selection for Data Driven Prediction of Protein Model Quality

The 2006 IEEE International Joint Conference on Neural Network Proceedings ◽

10.1109/ijcnn.2006.1716587 ◽

2006 ◽

Author(s):

A. Montuori ◽

L. Pugliese ◽

G. Raimondo ◽

E. Pasero

Keyword(s):

Feature Selection ◽

Data Driven ◽

Model Quality ◽

Protein Model ◽

Selection For

Download Full-text

QMEAN: A comprehensive scoring function for model quality assessment

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.21715 ◽

2008 ◽

Vol 71 (1) ◽

pp. 261-277 ◽

Cited By ~ 596

Author(s):

Pascal Benkert ◽

Silvio C. E. Tosatto ◽

Dietmar Schomburg

Keyword(s):

Quality Assessment ◽

Scoring Function ◽

Model Quality ◽

Model Quality Assessment

Download Full-text

Pushing the accuracy limit of shape complementarity for protein-protein docking

BMC Bioinformatics ◽

10.1186/s12859-019-3270-y ◽

2019 ◽

Vol 20 (S25) ◽

Cited By ~ 8

Author(s):

Yumeng Yan ◽

Sheng-You Huang

Keyword(s):

Success Rate ◽

Protein Interactions ◽

Shape Representation ◽

Scoring Function ◽

Protein Docking ◽

Protein Protein Interactions ◽

Second Best ◽

Shape Complementarity ◽

Docking Program ◽

Docking Approach

Abstract Background Protein-protein docking is a valuable computational approach for investigating protein-protein interactions. Shape complementarity is the most basic component of a scoring function and plays an important role in protein-protein docking. Despite significant progresses, shape representation remains an open question in the development of protein-protein docking algorithms, especially for grid-based docking approaches. Results We have proposed a new pairwise shape-based scoring function (LSC) for protein-protein docking which adopts an exponential form to take into account long-range interactions between protein atoms. The LSC scoring function was incorporated into our FFT-based docking program and evaluated for both bound and unbound docking on the protein docking benchmark 4.0. It was shown that our LSC achieved a significantly better performance than four other similar docking methods, ZDOCK 2.1, MolFit/G, GRAMM, and FTDock/G, in both success rate and number of hits. When considering the top 10 predictions, LSC obtained a success rate of 51.71% and 6.82% for bound and unbound docking, respectively, compared to 42.61% and 4.55% for the second-best program ZDOCK 2.1. LSC also yielded an average of 8.38 and 3.94 hits per complex in the top 1000 predictions for bound and unbound docking, respectively, followed by 6.38 and 2.96 hits for the second-best ZDOCK 2.1. Conclusions The present LSC method will not only provide an initial-stage docking approach for post-docking processes but also have a general implementation for accurate representation of other energy terms on grids in protein-protein docking. The software has been implemented in our HDOCK web server at http://hdock.phys.hust.edu.cn/.

Download Full-text

Multiscale simulation of small peptides: Consistent conformational sampling in atomistic and coarse-grained models

Journal of Computational Chemistry ◽

10.1002/jcc.22915 ◽

2012 ◽

Vol 33 (9) ◽

pp. 937-949 ◽

Cited By ~ 20

Author(s):

Olga Bezkorovaynaya ◽

Alexander Lukyanov ◽

Kurt Kremer ◽

Christine Peter

Keyword(s):

Multiscale Simulation ◽

Coarse Grained ◽

Conformational Sampling ◽

Small Peptides

Download Full-text

CSCORE: A SIMPLE YET EFFECTIVE SCORING FUNCTION FOR PROTEIN–LIGAND BINDING AFFINITY PREDICTION USING MODIFIED CMAC LEARNING ARCHITECTURE

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001100577x ◽

2011 ◽

Vol 09 (supp01) ◽

pp. 1-14 ◽

Cited By ~ 20

Author(s):

XUCHANG OUYANG ◽

STEPHANUS DANIEL HANDOKO ◽

CHEE KEONG KWOH

Keyword(s):

Binding Affinity ◽

Scoring Function ◽

Binding Mode ◽

Computational Method ◽

Data Driven ◽

Machine Learning Techniques ◽

Ligand Docking ◽

Scoring Functions ◽

Binding Affinity Prediction ◽

Affinity Prediction

Protein–ligand docking is a computational method to identify the binding mode of a ligand and a target protein, and predict the corresponding binding affinity using a scoring function. This method has great value in drug design. After decades of development, scoring functions nowadays typically can identify the true binding mode, but the prediction of binding affinity still remains a major problem. Here we present CScore, a data-driven scoring function using a modified Cerebellar Model Articulation Controller (CMAC) learning architecture, for accurate binding affinity prediction. The performance of CScore in terms of correlation between predicted and experimental binding affinities is benchmarked under different validation approaches. CScore achieves a prediction with R = 0.7668 and RMSE = 1.4540 when tested on an independent dataset. To the best of our knowledge, this result outperforms other scoring functions tested on the same dataset. The performance of CScore varies on different clusters under the leave-cluster-out validation approach, but still achieves competitive result. Lastly, the target-specified CScore achieves an even better result with R = 0.8237 and RMSE = 1.0872, trained on a much smaller but more relevant dataset for each target. The large dataset of protein–ligand complexes structural information and advances of machine learning techniques enable the data-driven approach in binding affinity prediction. CScore is capable of accurate binding affinity prediction. It is also shown that CScore will perform better if sufficient and relevant data is presented. As there is growth of publicly available structural data, further improvement of this scoring scheme can be expected.

Download Full-text

AutoMoG 3D: Automated Data-Driven Model Generation of Multi-Energy Systems Using Hinging Hyperplanes

Frontiers in Energy Research ◽

10.3389/fenrg.2021.719658 ◽

2021 ◽

Vol 9 ◽

Author(s):

Andreas Kämper ◽

Alexander Holtwerth ◽

Ludger Leenders ◽

André Bardow

Keyword(s):

Measurement Data ◽

Energy Systems ◽

Optimal Operation ◽

Data Driven ◽

Mixed Integer ◽

Model Generation ◽

Global Optimality ◽

Computationally Efficient ◽

Model Quality ◽

Pump System

The optimal operation of multi-energy systems requires optimization models that are accurate and computationally efficient. In practice, models are mostly generated manually. However, manual model generation is time-consuming, and model quality depends on the expertise of the modeler. Thus, reliable and automated model generation is highly desirable. Automated data-driven model generation seems promising due to the increasing availability of measurement data from cheap sensors and data storage. Here, we propose the method AutoMoG 3D (Automated Model Generation) to decrease the effort for data-driven generation of computationally efficient models while retaining high model quality. AutoMoG 3D automatically yields Mixed-Integer Linear Programming models of multi-energy systems enabling efficient operational optimization to global optimality using established solvers. For each component, AutoMoG 3D performs a piecewise-affine regression using hinging-hyperplane trees. Thereby, components can be modeled with an arbitrary number of independent variables. AutoMoG 3D iteratively increases the number of affine regions. Thereby, AutoMoG 3D balances the errors caused by each component in the overall model of the multi-energy system. AutoMoG 3D is applied to model a real-world pump system. Here, AutoMoG 3D drastically decreases the effort for data-driven model generation and provides an accurate and computationally efficient optimization model.

Download Full-text

A general data-driven algorithm for façade structure modeling using ground based laser data

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-3-381-2014 ◽

2014 ◽

Vol XL-3 ◽

pp. 381-386 ◽

Cited By ~ 1

Author(s):

M. Yousefzadeh ◽

F. H. M. Leurink ◽

M. Beheshti jou

Keyword(s):

Point Cloud ◽

Density Variation ◽

Data Driven ◽

Automatic Data ◽

Expert User ◽

Laser Data ◽

Modification Method ◽

General Data ◽

Fully Automatic ◽

Modelling Process

Façade reconstruction from laser point cloud has been an interesting subject in Photogrammetric community for the last two decades. However, due to the variety of architecture types and the nature of laser data, proposing a fully automatic modelling algorithm is still a challenge. Irregular architecture, density variation, occlusion and noise level are the main hindering factors of proposing a general model for façade reconstruction. This paper describes the sequences of an automatic data- driven method which starts from raw laser data and ends with object extraction. Statistical analysis was frequently utilized in segmentation, splitting line detection and object characterization. A rule-based modification method was employed to model the complexity of façade layout. Developed interface enables non-expert user to interact with modelling process by setting few parameters. The method was tested over a couple of datasets.

Download Full-text