Somatic immunoglobulin sequence divergence and its implications for studies of evolutionary divergence

G. Brian Golding

doi:10.1139/g88-059

FIND: Identifying Functionally and Structurally Important Features in Protein Sequences with Deep Neural Networks

10.1101/592808 ◽

2019 ◽

Author(s):

Ranjani Murali ◽

James Hemp ◽

Victoria Orphan ◽

Yonatan Bisk

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Hidden Markov Models ◽

Markov Models ◽

Genomic Sequence ◽

Hidden Markov ◽

Amino Acid Sequences ◽

Homologous Proteins ◽

Biological Studies ◽

Insight Into

AbstractThe ability to correctly predict the functional role of proteins from their amino acid sequences would significantly advance biological studies at the molecular level by improving our ability to understand the biochemical capability of biological organisms from their genomic sequence. Existing methods that are geared towards protein function prediction or annotation mostly use alignment-based approaches and probabilistic models such as Hidden-Markov Models. In this work we introduce a deep learning architecture (FunctionIdentification withNeuralDescriptions orFIND) which performs protein annotation from primary sequence. The accuracy of our methods matches state of the art techniques, such as protein classifiers based on Hidden Markov Models. Further, our approach allows for model introspection via a neural attention mechanism, which weights parts of the amino acid sequence proportionally to their relevance for functional assignment. In this way, the attention weights automatically uncover structurally and functionally relevant features of the classified protein and find novel functional motifs in previously uncharacterized proteins. While this model is applicable to any database of proteins, we chose to apply this model to superfamilies of homologous proteins, with the aim of extracting features inherent to divergent protein families within a larger superfamily. This provided insight into the functional diversification of an enzyme superfamily and its adaptation to different physiological contexts. We tested our approach on three families (nitrogenases, cytochromebd-type oxygen reductases and heme-copper oxygen reductases) and present a detailed analysis of the sequence characteristics identified in previously characterized proteins in the heme-copper oxygen reductase (HCO) superfamily. These are correlated with their catalytic relevance and evolutionary history. FIND was then applied to discover features in previously uncharacterized members of the HCO superfamily, providing insight into their unique sequence features. This modeling approach demonstrates the power of neural networks to recognize patterns in large datasets and can be utilized to discover biochemically and structurally important features in proteins from their amino acid sequences.Author summary

Download Full-text

Three Drosophila beta-tubulin sequences: a developmentally regulated isoform (beta 3), the testis-specific isoform (beta 2), and an assembly-defective mutation of the testis-specific isoform (B2t8) reveal both an ancient divergence in metazoan isotypes and structural constraints for beta-tubulin function.

Molecular and Cellular Biology ◽

10.1128/mcb.7.6.2231 ◽

1987 ◽

Vol 7 (6) ◽

pp. 2231-2242 ◽

Cited By ~ 72

Author(s):

J E Rudolph ◽

M Kimble ◽

H D Hoyle ◽

M A Subler ◽

E C Raff

Keyword(s):

Amino Acid ◽

Sequence Divergence ◽

Amino Acid Sequences ◽

Single Amino Acid ◽

Structural Constraints ◽

Wild Type ◽

Beta Tubulin ◽

Developmentally Regulated ◽

Single Amino Acid Residue ◽

Beta 2

The genomic DNA sequence and deduced amino acid sequence are presented for three Drosophila melanogaster beta-tubulins: a developmentally regulated isoform beta 3-tubulin, the wild-type testis-specific isoform beta 2-tubulin, and an ethyl methanesulfonate-induced assembly-defective mutation of the testis isoform, B2t8. The testis-specific beta 2-tubulin is highly homologous to the major vertebrate beta-tubulins, but beta 3-tubulin is considerably diverged. Comparison of the amino acid sequences of the two Drosophila isoforms to those of other beta-tubulins indicates that these two proteins are representative of an ancient sequence divergence event which at least preceded the split between lines leading to vertebrates and invertebrates. The intron/exon structures of the genes for beta 2- and beta 3-tubulin are not the same. The structure of the gene for the variant beta 3-tubulin isoform, but not that of the testis-specific beta 2-tubulin gene, is similar to that of vertebrate beta-tubulins. The mutation B2t8 in the gene for the testis-specific beta 2-tubulin defines a single amino acid residue required for normal assembly function of beta-tubulin. The sequence of the B2t8 gene is identical to that of the wild-type gene except for a single nucleotide change resulting in the substitution of lysine for glutamic acid at residue 288. This position falls at the junction between two major structural domains of the beta-tubulin molecule. Although this hinge region is relatively variable in sequence among different beta-tubulins, the residue corresponding to glu 288 of Drosophila beta 2-tubulin is highly conserved as an acidic amino acid not only in all other beta-tubulins but in alpha-tubulins as well.

Download Full-text

Insights into the mysterious genetic variation profile oftprKinTreponema pallidumunder the development of natural human syphilis infection

10.1101/536573 ◽

2019 ◽

Author(s):

Dan Liu ◽

Man-Li Tong ◽

Yong Lin ◽

Li-Li Liu ◽

Li-Rong Lin ◽

...

Keyword(s):

Amino Acid ◽

Immune Evasion ◽

High Frequency ◽

Critical Role ◽

Treponema Pallidum ◽

Secondary Syphilis ◽

Human Infection ◽

Amino Acid Sequences ◽

Potential Vaccine

AbstractAlthough the variations of thetprKgene inTreponema pallidumwere considered to play a critical role in the pathogenesis of syphilis, how actual variable characteristics oftprKin the course of natural human infection enabling the pathogen’s survive has thus far remained unclear. Here, we performed NGS to investigatetprKofT. pallidumdirectly from primary and secondary syphilis samples. Compared with diversity intprKof the strains from primary syphilis samples, there were more mixture variants found within seven V regions of thetprKgene among the strains from secondary syphilis samples, and the frequencies of predominant sequences within V regions oftprKwere generally decreased (less than 80%) with the proportion of minor variants in 10-60% increasing. Noteworthy, the variations within V regions oftprKalways obeyed a strict 3 bp changing pattern. AndtprKin the strains from the two-stage samples kept some stable amino acid sequences within V regions. Particularly, the amino acid sequences IASDGGAIKH and IASEDGSAGNLKH in V1 not only presented a high proportion of inter-population sharing, but also presented a relatively high frequency (above 80%) in the populations. Besides,tprKalways demonstrated remarkable variability in V6 at both the intra- and inter-strain levels regardless of the course. These findings unveiled that the different profile oftprK in T. pallidumdirectly from primary and secondary syphilis samples, indicating that throughout the development of syphilisT. pallidumconstantly varies its domaintprKgene to obtain the best adaptation to the host. While this changing was always subjected a strict gene conversion mechanism to keep an abnormal TprK. The highly stable peptides found in V1 would probably be promising potential vaccine components. And the highly heterogenetic regions (e.g. V6) could provide insight into the mysterious role oftprKin immune evasion.Author summaryAlthough the variations of thetprKgene inTreponema pallidumwere considered to play a critical role in the pathogenesis of syphilis, how actual variable characteristics oftprKin the course of natural human infection enabling the pathogen’s survive has thus far remained unclear. Here, we performed next-generation sequencing, a more sensitive and reliable approach, to investigatetprKofTreponema pallidumdirectly from primary and secondary syphilis patients, revealing that the profile oftprKinT. pallidumfrom the two-stage samples was different. Within the strains from secondary syphilis patients, more mixture variants within seven V regions oftprKwere found, the frequencies of their predominant sequences were generally decreased with the proportion of minor variants in 10-60% was increased. And the variations within V regions oftprKalways obeyed a strict 3 bp changing pattern. Noteworthy, the amino acid sequences IASDGGAIKH and IASEDGSAGNLKH in V1 presented a high proportion of inter-population sharing and presented a relatively high frequency in the populations. And V6 region always demonstrated remarkable variability at intra- and inter-patient levels regardless of the course. These findings provide insights into the mysterious role of TprK in immune evasion and for further exploring the potential vaccine components.

Download Full-text

A coarse-graining, ultrametric approach to resolve the phylogeny of prokaryotic strains with frequent homologous recombination

10.21203/rs.2.18054/v2 ◽

2020 ◽

Author(s):

Tin Yau Pang

Keyword(s):

Amino Acid ◽

Homologous Recombination ◽

Empirical Distribution ◽

Local Density ◽

Phylogenetic Reconstruction ◽

Sequence Divergence ◽

Coarse Graining ◽

Amino Acid Sequences ◽

Phylogenetic Distance ◽

Frequent Event

Abstract Background A frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many of them reconstruct the ancestral recombination graph (ARG) to help infer the genome tree, and require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption. Results Here, we propose a coarse-graining approach for phylogenetic reconstruction (CGP), which is recombination-aware but forgoes ARG reconstruction. It accounts for the tendency of a higher effective recombination rate between genomes with a lower phylogenetic distance. It is applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches. Conclusions The CGP algorithm is more accurate than alternative recombination-aware methods for ultrametric phylogenetic reconstructions.

Download Full-text

Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates

PeerJ ◽

10.7717/peerj.3391 ◽

2017 ◽

Vol 5 ◽

pp. e3391 ◽

Cited By ~ 6

Author(s):

Dariya K. Sydykova ◽

Claus O. Wilke

Keyword(s):

Amino Acid ◽

Conservation Score ◽

Amino Acid Level ◽

Sequence Divergence ◽

Similar Rate ◽

Amino Acid Sequences ◽

Evolutionary Rates ◽

Site Specific ◽

Relative Conservation ◽

The Relationship

Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN∕dS, using either dN∕dS models or mutation–selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN∕dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN∕dS, and the correlation strengths increase in alignments with greater sequence divergence and more taxa. Moreover, Rate4Site scores correlate very well with inferred (as opposed to true) dN∕dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN∕dS in a variety of empirical datasets. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences.

Download Full-text

Three Drosophila beta-tubulin sequences: a developmentally regulated isoform (beta 3), the testis-specific isoform (beta 2), and an assembly-defective mutation of the testis-specific isoform (B2t8) reveal both an ancient divergence in metazoan isotypes and structural constraints for beta-tubulin function

Molecular and Cellular Biology ◽

10.1128/mcb.7.6.2231-2242.1987 ◽

1987 ◽

Vol 7 (6) ◽

pp. 2231-2242

Author(s):

J E Rudolph ◽

M Kimble ◽

H D Hoyle ◽

M A Subler ◽

E C Raff

Keyword(s):

Amino Acid ◽

Sequence Divergence ◽

Amino Acid Sequences ◽

Single Amino Acid ◽

Structural Constraints ◽

Wild Type ◽

Beta Tubulin ◽

Developmentally Regulated ◽

Single Amino Acid Residue ◽

Beta 2

The genomic DNA sequence and deduced amino acid sequence are presented for three Drosophila melanogaster beta-tubulins: a developmentally regulated isoform beta 3-tubulin, the wild-type testis-specific isoform beta 2-tubulin, and an ethyl methanesulfonate-induced assembly-defective mutation of the testis isoform, B2t8. The testis-specific beta 2-tubulin is highly homologous to the major vertebrate beta-tubulins, but beta 3-tubulin is considerably diverged. Comparison of the amino acid sequences of the two Drosophila isoforms to those of other beta-tubulins indicates that these two proteins are representative of an ancient sequence divergence event which at least preceded the split between lines leading to vertebrates and invertebrates. The intron/exon structures of the genes for beta 2- and beta 3-tubulin are not the same. The structure of the gene for the variant beta 3-tubulin isoform, but not that of the testis-specific beta 2-tubulin gene, is similar to that of vertebrate beta-tubulins. The mutation B2t8 in the gene for the testis-specific beta 2-tubulin defines a single amino acid residue required for normal assembly function of beta-tubulin. The sequence of the B2t8 gene is identical to that of the wild-type gene except for a single nucleotide change resulting in the substitution of lysine for glutamic acid at residue 288. This position falls at the junction between two major structural domains of the beta-tubulin molecule. Although this hinge region is relatively variable in sequence among different beta-tubulins, the residue corresponding to glu 288 of Drosophila beta 2-tubulin is highly conserved as an acidic amino acid not only in all other beta-tubulins but in alpha-tubulins as well.

Download Full-text

AnEimeriavaccine candidate appears to be lactate dehydrogenase; characterization and comparative analysis

Parasitology ◽

10.1017/s0031182004005104 ◽

2004 ◽

Vol 128 (6) ◽

pp. 603-616 ◽

Cited By ~ 17

Author(s):

D. SCHAAP ◽

G. ARTS ◽

J. KROEZE ◽

R. NIESSEN ◽

S. V. ROOSMALEN-VOS ◽

...

Keyword(s):

Amino Acid ◽

Lactate Dehydrogenase ◽

3D Model ◽

Amino Acid Sequences ◽

Evolutionary Divergence ◽

Model Structure ◽

Partial Protection ◽

Primary Amino ◽

The Many ◽

Intracellular Stage

AnEimeria acervulinaprotein fraction was identified which conferred partial protection against anE. acervulinachallenge infection. From this fraction a 37 kDa protein was purified and its corresponding cDNA was cloned and shown to encode a lactate dehydrogenase (LDH). Full length cDNAs encoding LDH from two related species,E. tenellaandE. maxima, were also cloned. The homology between the primary amino acid sequences of these threeEimeriaLDH enzymes was rather low (66–80%), demonstrating an evolutionary divergence. ThePlasmodiumLDH crystal structure was used to generate a 3D-model structure ofE. tenellaLDH, which demonstrated that the many variations in the primary amino acid sequences (P. falciparumLDH andE. tenellaLDH show only 47% identity) had not resulted in altered 3D-structures. Only a single LDH gene was identified inEimeria, which was active as a homotetramer. The protein was present at similar levels throughout different parasitic stages (oocysts, sporozoites, schizonts and merozoites), but its corresponding RNA was only observed in the schizont stage, suggesting that its synthesis is restricted to the intracellular stage.

Download Full-text

Progressive immunoglobulin gene mutations in chronic lymphocytic leukemia: evidence for antigen-driven intraclonal diversification

Blood ◽

10.1182/blood-2006-05-020644 ◽

2006 ◽

Vol 109 (4) ◽

pp. 1559-1567 ◽

Cited By ~ 26

Author(s):

Alicia D. Volkheimer ◽

J. Brice Weinberg ◽

Bethany E. Beasley ◽

John F. Whitesides ◽

Jon P. Gockerman ◽

...

Keyword(s):

Amino Acid ◽

Chronic Lymphocytic Leukemia ◽

B Cell ◽

Somatic Mutations ◽

Single Cell Analysis ◽

Gene Mutations ◽

Lymphocytic Leukemia ◽

Light Chains ◽

Immunoglobulin Gene ◽

Immunoglobulin Genes

Abstract Somatic mutations of immunoglobulin genes characterize mature memory B cells, and intraclonal B-cell diversification is typically associated with expansion of B-cell clones with greater affinity for antigen (antigen drive). Evidence for a role of antigen in progression of intraclonal chronic lymphocytic leukemia (CLL) cell diversification in patients with mutated immunoglobulin genes has not been previously presented. We performed a single-cell analysis of immunoglobulin heavy and light chains in 6 patients with somatically mutated CLL-cell immunoglobulin genes and identified 2 patients with multiple related (oligoclonal) subgroups of CLL cells. We constructed genealogic trees of these oligoclonal CLL-cell subgroups and assessed the effects of immunoglobulin somatic mutations on the ratios of replacement and silent amino acid changes in the framework and antigen-binding regions (CDRs) of the immunoglobulin heavy and light chains from each oligoclonal CLL-cell population. In one subject, the amino acid changes were consistent with an antigen-driven progression of clonally related CLL-cell populations. In the other subject, intraclonal diversification was associated with immunoglobulin amino acid changes that would have likely lessened antigen affinity. Taken together, these studies support the hypothesis that in some CLL cases intraclonal diversification is dependent on antigen interactions with immunoglobulin receptors.

Download Full-text

Carnivora: The Amino Acid Sequence of the Adult European Mink (Mustela lutreola, Mustelidae) Hemoglobins

Zeitschrift für Naturforschung C ◽

10.1515/znc-1990-3-413 ◽

1990 ◽

Vol 45 (3-4) ◽

pp. 223-228 ◽

Cited By ~ 2

Author(s):

Aftab Ahmed ◽

Meeno Jahan ◽

Gerhard Braumtzer

Keyword(s):

Amino Acid ◽

Gas Phase ◽

Ion Exchange Chromatography ◽

Exchange Chromatography ◽

Amino Acid Sequences ◽

Mustela Lutreola ◽

European Mink ◽

Globin Chains ◽

The Family ◽

Insight Into

Abstract The complete amino acid sequences of the hemoglobins from the adult European mink (Mustela lutreola) are presented. The erythrocytes contain two hemoglobin components and three globin chains. The isolation of globin chains achieved by ion-exchange chromatography on a column of CM -cellulose in 8 M urea buffer. The primary structure of globin chains and of the tryptic peptides determined in liquid-and gas-phase sequenators. The alignment of the a-and β-chains with those of reported sequences from other carnivora species belonging to the family Mustelidae may give an insight into the evolution of this molecule.

Download Full-text

A coarse-graining, ultrametric approach to resolve the phylogeny of prokaryotic strains with frequent homologous recombination

10.21203/rs.2.18054/v3 ◽

2020 ◽

Author(s):

Tin Yau Pang

Keyword(s):

Amino Acid ◽

Homologous Recombination ◽

Empirical Distribution ◽

Local Density ◽

Phylogenetic Reconstruction ◽

Sequence Divergence ◽

Coarse Graining ◽

Amino Acid Sequences ◽

Phylogenetic Distance ◽

Frequent Event

Abstract Background A frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many of them reconstruct the ancestral recombination graph (ARG) to help infer the genome tree, and require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption. Methods Here, we propose a coarse-graining approach for phylogenetic reconstruction (CGP), which is recombination-aware but forgoes ARG reconstruction. It accounts for the tendency of a higher effective recombination rate between genomes with a lower phylogenetic distance. It is applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. Results Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches.Conclusions The CGP algorithm is more accurate than alternative recombination-aware methods for ultrametric phylogenetic reconstructions.

Download Full-text