scholarly journals Private haplotype barcoding facilitates inexpensive high-resolution genotyping of multiparent crosses

2017 ◽  
Author(s):  
Daniel A. Skelly ◽  
John H. McCusker ◽  
Eric A. Stone ◽  
Paul M. Magwene

AbstractInexpensive, high-throughput sequencing has led to the generation of large numbers of sequenced genomes representing diverse lineages in both model and non-model organisms. Such resources are well suited for the creation of new multiparent populations to identify quantitative trait loci that contribute to variation in phenotypes of interest. However, despite significant drops in per-base sequencing costs, the costs of sample handling and library preparation remain high, particularly when many samples are sequenced. We describe a novel method for pooled genotyping of offspring from multiple genetic crosses, such as those that that make up multiparent populations. Our approach, which we call "private haplotype barcoding” (PHB), utilizes private haplotypes to deconvolve patterns of inheritance in individual offspring from mixed pools composed of multiple offspring. We demonstrate the efficacy of this approach by applying the PHB method to whole genome sequencing of 96 segregants from 12 yeast crosses, achieving over a 90% reduction in sample preparation costs relative to non-pooled sequencing. In addition, we implement a hidden Markov model to calculate genotype probabilities for a generic PHB run and a specialized hidden Markov model for the yeast crosses that improves genotyping accuracy by making use of tetrad information. Private haplotype barcoding holds particular promise for facilitating inexpensive genotyping of large pools of offspring in diverse non-model systems.

Author(s):  
Hai Yang ◽  
Daming Zhu

Copy number variation (CNV) is a prevalent kind of genetic structural variation which leads to an abnormal number of copies of large genomic regions, such as gain or loss of DNA segments larger than 1[Formula: see text]kb. CNV exists not only in human genome but also in plant genome. Current researches have testified that CNV is associated with many complex diseases. In this paper, guanine-cytosine (GC) bias, mappability and their effect on read depth signals in sequencing data are discussed first. Subsequently, a new correction method for GC bias and an improved combinatorial detection algorithm for CNV using high-throughput sequencing reads based on hidden Markov model (CNV-HMM) are proposed. The corrected read depth signals have lower correlation with GC content, mappability of reads and the width of analysis window. Then we create a hidden Markov model which maps the reads onto the reference genome and records the unmapped reads. The unmapped reads are counted and normalized. The CNV-HMM detects the abnormal signal of read count and gains the candidate CNVs using the expectation maximization (EM) algorithm. Finally, we filter the candidate CNVs using split reads to promote the performance of our algorithm. The experiment result indicates that the CNV-HMM algorithm has higher accuracy and sensitivity for CNVs detection than most current detection algorithms.


1999 ◽  
Vol 36 (04) ◽  
pp. 987-998 ◽  
Author(s):  
Ulrich Herkenrath

We consider a sequence of observations which is generated by a so-called hidden Markov model. An exponential smoothing procedure applied to such an observation sequence generates an inhomogeneous Markov process as a sequence of smoothed values. If the state sequence of the underlying hidden Markov model is moreover ergodic, then for two classes of smoothing functions the strong ergodicity of the sequence of smoothed values is proved. As a consequence a central limit theorem and a law of large numbers hold true for the smoothed values. The proof uses general results for so-called convergent inhomogeneous Markov processes. The procedure proposed by the author can be applied to some time series discussed in the literature.


Author(s):  
Natsuki Iwano ◽  
Tatsuo Adachi ◽  
Kazuteru Aoki ◽  
Yoshikazu Nakamura ◽  
Michiaki Hamada

AbstractNucleic acid aptamers are generated by an in vitro molecular evolution method known as systematic evolution of ligands by exponential enrichment (SELEX). A variety of candidates is limited by actual sequencing data from an experiment. Here, we developed RaptGen, which is a variational autoencoder for in silico aptamer generation. RaptGen exploits a profile hidden Markov model decoder to represent motif sequences effectively. We showed that RaptGen embedded simulation sequence data into low-dimension latent space dependent on motif information. We also performed sequence embedding using two independent SELEX datasets. RaptGen successfully generated aptamers from the latent space even though they were not included in high-throughput sequencing. RaptGen could also generate a truncated aptamer with a short learning model. We demonstrated that RaptGen could be applied to activity-guided aptamer generation according to Bayesian optimization. We concluded that a generative method by RaptGen and latent representation are useful for aptamer discovery. Codes are available at https://github.com/hmdlab/raptgen.


2017 ◽  
Vol Volume 113 (Number 1/2) ◽  
Author(s):  
Febe de Wet ◽  
Neil Kleynhans ◽  
Dirk van Compernolle ◽  
Reza Sahraeian ◽  
◽  
...  

Abstract For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish – an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only.


1999 ◽  
Vol 36 (4) ◽  
pp. 987-998 ◽  
Author(s):  
Ulrich Herkenrath

We consider a sequence of observations which is generated by a so-called hidden Markov model. An exponential smoothing procedure applied to such an observation sequence generates an inhomogeneous Markov process as a sequence of smoothed values. If the state sequence of the underlying hidden Markov model is moreover ergodic, then for two classes of smoothing functions the strong ergodicity of the sequence of smoothed values is proved. As a consequence a central limit theorem and a law of large numbers hold true for the smoothed values. The proof uses general results for so-called convergent inhomogeneous Markov processes. The procedure proposed by the author can be applied to some time series discussed in the literature.


Author(s):  
Jürgen Claesen ◽  
Tomasz Burzykowski

AbstractThe analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on


2012 ◽  
Vol 132 (10) ◽  
pp. 1589-1594 ◽  
Author(s):  
Hayato Waki ◽  
Yutaka Suzuki ◽  
Osamu Sakata ◽  
Mizuya Fukasawa ◽  
Hatsuhiro Kato

Sign in / Sign up

Export Citation Format

Share Document