Evaluating molecular modeling tools for thermal stability using an independently generated dataset
ABSTRACTEngineering proteins to enhance thermal stability is a widely utilized approach for creating industrially relevant biocatalysts. Computational tools that guide these engineering efforts remain an active area of research with new data sets and develop algorithms. To aid in these efforts, we are reporting an expansion of our previously published data set of mutants for a β-glucosidase to include both measures of TM and ΔΔG, to complement the previously reported measures of T50 and kinetic constants (kcat and KM). For a set of 51 mutants, we found that T50 and TM are moderately correlated with a Pearson correlation coefficient (PCC) of 0.58, indicated the two methods capture different physical features. The performance of predicted stability using five computational tools are also evaluated on the 51 mutants dataset, none of which are found to be strong predictors of the observed changes in T50, TM, or ΔΔG. Furthermore, the ability of the five algorithms to predict the production of isolatable soluble protein is examined, which revealed that Rosetta ΔΔG, ELASPIC, and DeepDDG are capable of predicting if a mutant could be produced and isolated as a soluble protein. These results further highlight the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of soluble mutants.