Hidden Markov Models Lead to Higher Resolution Maps of Mutation Signature Activity in Cancer

Mapping Intimacies ◽

10.1101/392639 ◽

2018 ◽

Author(s):

Xiaoqing Huang ◽

Itay Sason ◽

Damian Wojtowicz ◽

Yoo-Ah Kim ◽

Mark D.M. Leiserson ◽

...

Keyword(s):

Markov Models ◽

Hidden Markov ◽

Biological Knowledge ◽

Sequential Dependencies ◽

Replication Time ◽

Tumor Level ◽

Genomic Regions ◽

Mutational Processes ◽

Insight Into ◽

Mutational Process

AbstractKnowing the activity of the mutational processes shaping a cancer genome may provide insight into tumorigenesis and personalized therapy. It is thus important to uncover the characteristic signatures of active mutational processes in patients from their patterns of single base substitutions. However, mutational processes do not act uniformly on the genome and are biased by factors such as the genome’s chromatin structure or replication origins. These factors may lead to statistical dependencies among neighboring mutations, calling for modeling approaches that can account for such dependencies to better estimate mutational process activities.Here we develop the first sequence-dependent models for mutation signatures. We apply these models to characterize genomic and other factors that influence the activity of previously validated mutation signatures in breast cancer. We find that our tool, SigMa, can accurately assign genomic mutations to mutation signatures, yielding assignments that are of higher likelihood than those obtained with models that assume independence between signatures and align better with current biological knowledge. Our analysis resolves a controversy related to the dependency of APOBEC signatures on replication time and links Signatures 18 and 30 to oxidative damage.Modeling the sequential dependencies of mutation signatures leads to improved estimates of mutation signature activity both at the tumor-level and within specific genomic regions, yielding higher resolution maps of mutation signature activity in cancer.

Download Full-text

FIND: Identifying Functionally and Structurally Important Features in Protein Sequences with Deep Neural Networks

10.1101/592808 ◽

2019 ◽

Author(s):

Ranjani Murali ◽

James Hemp ◽

Victoria Orphan ◽

Yonatan Bisk

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Hidden Markov Models ◽

Markov Models ◽

Genomic Sequence ◽

Hidden Markov ◽

Amino Acid Sequences ◽

Homologous Proteins ◽

Biological Studies ◽

Insight Into

AbstractThe ability to correctly predict the functional role of proteins from their amino acid sequences would significantly advance biological studies at the molecular level by improving our ability to understand the biochemical capability of biological organisms from their genomic sequence. Existing methods that are geared towards protein function prediction or annotation mostly use alignment-based approaches and probabilistic models such as Hidden-Markov Models. In this work we introduce a deep learning architecture (FunctionIdentification withNeuralDescriptions orFIND) which performs protein annotation from primary sequence. The accuracy of our methods matches state of the art techniques, such as protein classifiers based on Hidden Markov Models. Further, our approach allows for model introspection via a neural attention mechanism, which weights parts of the amino acid sequence proportionally to their relevance for functional assignment. In this way, the attention weights automatically uncover structurally and functionally relevant features of the classified protein and find novel functional motifs in previously uncharacterized proteins. While this model is applicable to any database of proteins, we chose to apply this model to superfamilies of homologous proteins, with the aim of extracting features inherent to divergent protein families within a larger superfamily. This provided insight into the functional diversification of an enzyme superfamily and its adaptation to different physiological contexts. We tested our approach on three families (nitrogenases, cytochromebd-type oxygen reductases and heme-copper oxygen reductases) and present a detailed analysis of the sequence characteristics identified in previously characterized proteins in the heme-copper oxygen reductase (HCO) superfamily. These are correlated with their catalytic relevance and evolutionary history. FIND was then applied to discover features in previously uncharacterized members of the HCO superfamily, providing insight into their unique sequence features. This modeling approach demonstrates the power of neural networks to recognize patterns in large datasets and can be utilized to discover biochemically and structurally important features in proteins from their amino acid sequences.Author summary

Download Full-text

Mutation signatures reveal biological processes in human cancer

10.1101/036541 ◽

2016 ◽

Cited By ~ 6

Author(s):

Kyle Covington ◽

Eve Shinbrot ◽

David A Wheeler

Keyword(s):

Sparse Matrix ◽

Human Cancer ◽

Replication Errors ◽

Repair Processes ◽

Relative Contribution ◽

History Of ◽

Mutational Processes ◽

Insight Into ◽

Mutational Process ◽

Non Negative Matrix Factorization

Replication errors in the genome accumulate from a variety of mutational processes, which leave a history of mutations on the affected genome. The relative contribution of each mutational process has been characterized by non-negative matrix factorization and has lead to deeper insight into both mutational and repair processes contributing to cancer. However current implementations of NMF have left unresolved some specific patterns that should be present in the mutation data and have not generated signatures designed for classification. Here, we use a variant of NMF, termed non-smooth NMF, to generate sparse matrix factorizations of somatic mutation profiles present in 7129 tumors. nsNMF factorization revealed 21 mutational signatures. We found three APOBEC mutational processes clearly segregating with the published APOBEC enzymology and trans-lesion repair processes. We discovered several signatures differed between geographic locations even between closely related tissues.

Download Full-text

THE PURITY MEASURE FOR GENOMIC REGIONS LEADS TO HORIZONTALLY TRANSFERRED GENES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013430026 ◽

2013 ◽

Vol 11 (06) ◽

pp. 1343002 ◽

Cited By ~ 2

Author(s):

YUTA TANIGUCHI ◽

YASUHIRO YAMADA ◽

OSAMU MARUYAMA ◽

SATORU KUHARA ◽

DAISUKE IKEDA

Keyword(s):

Sequence Analysis ◽

High Purity ◽

Domain Knowledge ◽

Markov Models ◽

Hidden Markov ◽

Bacterial Genome ◽

Genome Sequences ◽

Sequence Alignments ◽

A Genome ◽

Genomic Regions

Sequence analysis is important to understand a genome, and a number of approaches such as sequence alignments and hidden Markov models have been employed. In the field of text mining, the purity measure is developed to detect unusual regions of a string without any domain knowledge. It is reported in that work that only RNAs and transposons are shown to have high purity values. In this work, the purity values of regions of various bacterial genome sequences are computed, and those regions are analyzed extensively. It is found that mobile elements and phages as well as RNAs and transposons have high purity values. It is interesting that they are all classified into a group of horizontally transferred genes. This means that the purity measure is useful to predict horizontally transferred genes.

Download Full-text

Estimating Personality Impression from Speech Record Using Hidden Markov Models

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.135.1517 ◽

2015 ◽

Vol 135 (12) ◽

pp. 1517-1523 ◽

Cited By ~ 1

Author(s):

Yicheng Jin ◽

Takuto Sakuma ◽

Shohei Kato ◽

Tsutomu Kunitachi

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov

Download Full-text

Hidden Markov Processes

10.23943/princeton/9780691133157.001.0001 ◽

2014 ◽

Cited By ~ 2

Author(s):

M. Vidyasagar

Keyword(s):

Hidden Markov Models ◽

Markov Processes ◽

Viterbi Algorithm ◽

Markov Models ◽

Hidden Markov ◽

Local Alignment ◽

Biological Applications ◽

Standard Material ◽

Hidden Markov Processes ◽

Genomics And Proteomics

This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. It starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics. The topics examined include standard material such as the Perron–Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum–Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. It also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored.

Download Full-text