scholarly journals End-to-end multitask learning, from protein language to protein features without alignments

2019 ◽  
Author(s):  
Ahmed Elnaggar ◽  
Michael Heinzinger ◽  
Christian Dallago ◽  
Burkhard Rost

AbstractCorrectly predicting features of protein structure and function from amino acid sequence alone remains a supreme challenge for computational biology. For almost three decades, state-of-the-art approaches combined machine learning and evolutionary information from multiple sequence alignments. Exponentially growing sequence databases make it infeasible to gather evolutionary information for entire microbiomes or meta-proteomics. On top, for many important proteins (e.g. dark proteome and intrinsically disordered proteins) evolutionary information remains limited. Here, we introduced a novel approach combining recent advances of Language Models (LMs) with multi-task learning to successfully predict aspects of protein structure (secondary structure) and function (cellular component or subcellular localization) without using any evolutionary information from alignments. Our approach fused self-supervised pre-training LMs on an unlabeled big dataset (UniRef50, corresponding to 9.6 billion words) with supervised training on labelled high-quality data in one single end-to-end network. We provided a proof-of-principle for the novel concept through the semi-successful per-residue prediction of protein secondary structure and through per-protein predictions of localization (Q10=69%) and the distinction between integral membrane and water-soluble proteins (Q2=89%). Although these results did not reach the levels obtained by the best available methods using evolutionary information from alignments, these less accurate multi-task predictions have the advantage of speed: they are 300-3000 times faster (where HHblits needs 30-300 seconds on average, our method needed 0.045 seconds). These new results push the boundaries of predictability towards grayer and darker areas of the protein space, allowing to make reliable predictions for proteins which were not accessible by previous methods. On top, our method remains scalable as it removes the necessity to search sequence databases for evolutionary related proteins.

2020 ◽  
Author(s):  
Khondker Rufaka Hossain ◽  
Daniel Clayton ◽  
Sophia C Goodchild ◽  
Alison Rodger ◽  
Richard James Payne ◽  
...  

Membrane protein structure and function are modulated via interactions with their lipid environment. This is particularly true for the integral membrane pumps, the P-type ATPases. These ATPases play vital roles...


2017 ◽  
Vol 6 (1) ◽  
pp. 75-92 ◽  
Author(s):  
Elka R. Georgieva

AbstractCellular membranes and associated proteins play critical physiological roles in organisms from all life kingdoms. In many cases, malfunction of biological membranes triggered by changes in the lipid bilayer properties or membrane protein functional abnormalities lead to severe diseases. To understand in detail the processes that govern the life of cells and to control diseases, one of the major tasks in biological sciences is to learn how the membrane proteins function. To do so, a variety of biochemical and biophysical approaches have been used in molecular studies of membrane protein structure and function on the nanoscale. This review focuses on electron paramagnetic resonance with site-directed nitroxide spin-labeling (SDSL EPR), which is a rapidly expanding and powerful technique reporting on the local protein/spin-label dynamics and on large functionally important structural rearrangements. On the other hand, adequate to nanoscale study membrane mimetics have been developed and used in conjunction with SDSL EPR. Primarily, these mimetics include various liposomes, bicelles, and nanodiscs. This review provides a basic description of the EPR methods, continuous-wave and pulse, applied to spin-labeled proteins, and highlights several representative applications of EPR to liposome-, bicelle-, or nanodisc-reconstituted membrane proteins.


2021 ◽  
Vol 28 (1) ◽  
Author(s):  
Kavita Sharma ◽  
Kanipakam Hema ◽  
Naveen Kumar Bhatraju ◽  
Ritushree Kukreti ◽  
Rajat Subhra Das ◽  
...  

2007 ◽  
Vol 157 (2) ◽  
pp. 329-338 ◽  
Author(s):  
Jane F. Povey ◽  
C. Mark Smales ◽  
Stuart J. Hassard ◽  
Mark J. Howard

Sign in / Sign up

Export Citation Format

Share Document