A sorghum Practical Haplotype Graph facilitates genome-wide imputation and cost-effective genomic prediction

Mapping Intimacies ◽

10.1101/775221 ◽

2019 ◽

Author(s):

Sarah E. Jensen ◽

Jean Rigaud Charles ◽

Kebede Muleta ◽

Peter Bradbury ◽

Terry Casstevens ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Sequence Data ◽

Input Sequence ◽

Genotyping By Sequencing ◽

Cost Effective ◽

Genome Wide ◽

Variant Information ◽

Sequencing Platforms ◽

Low Coverage

AbstractSuccessful management and utilization of increasingly large genomic datasets is essential for breeding programs to increase genetic gain and accelerate cultivar development. To help with data management and storage, we developed a sorghum Practical Haplotype Graph (PHG) pangenome database that stores all identified haplotypes and variant information for a given set of individuals. We developed two PHGs in sorghum, one with 24 individuals and another with 398 individuals, that reflect the diversity across genic regions of the sorghum genome. 24 founders of the Chibas sorghum breeding program were sequenced at low coverage (0.01x) and processed through the PHG to identify genome-wide variants. The PHG called SNPs with only 5.9% error at 0.01x coverage - only 3% lower than its accuracy when calling SNPs from 8x coverage sequence. Additionally, 207 progeny from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes in the progeny were imputed from the parental haplotypes available in the PHG and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from 0.57-0.73 for different traits, and are similar to prediction accuracies obtained with genotyping-by-sequencing (GBS) or markers from sequencing targeted amplicons (rhAmpSeq). This study provides a proof of concept for using a sorghum PHG to call and impute SNPs from low-coverage sequence data and also shows that the PHG can unify genotype calls from different sequencing platforms. By reducing the amount of input sequence needed, the PHG has the potential to decrease the cost of genotyping for genomic selection, making GS more feasible and facilitating larger breeding populations that can capture maximum recombination. Our results demonstrate that the PHG is a useful research and breeding tool that can maintain variant information from a diverse group of taxa, store sequence data in a condensed but readily accessible format, unify genotypes from different genotyping methods, and provide a cost-effective option for genomic selection for any species.

Download Full-text

Hypothesis-free biomarker discovery platform evolving into a companion diagnostic using bioinformatics.

Journal of Clinical Oncology ◽

10.1200/jco.2012.30.30_suppl.52 ◽

2012 ◽

Vol 30 (30_suppl) ◽

pp. 52-52

Author(s):

Gitte Pedersen

Keyword(s):

Drug Discovery ◽

Biomarker Discovery ◽

Sequence Data ◽

Single Gene ◽

Digital Gene Expression ◽

Cost Effective ◽

Cancer Drug ◽

Cancer Drug Discovery ◽

Genome Wide ◽

Sequencing Platforms

52 Background: Cancer drug discovery is hypothesis driven focused on addressing specific mechanisms; however the patient’s individual tumor harbor much more diversity and therefore the biomarkers behind the hypothesis are often insufficient in explaining the differences in individual drug response. A Danish Private Public Partnership is developing biomarkers for cancer prognostics under a $30 million grant. The Danish National Biobank was established by another $20 million grant and hosts approximately 15 million biological samples associated with cradle-to-grave electronic medical records. Methods: Proposing a hypothesis-free biomarker-discovery platform using intelligent target-filtering technology integrated with sequencing short reads on a small chip addressing the sample prep and data analysis bottlenecks in current sequencing platforms. By using bioinformatics on existing sequence data it is possible to optimize the proposed strategy and develop novel diagnostic applications. The intelligent target-filtering process generates a single-stranded tag library. Because the tags are single stranded, it is possible to detect all possible tags using a universal high-throughput platform. The cost per sample is low, the content/test very high, and hypotheses free. This approach represents a paradigm shift in the biomarker-discovery process, saving significant time and money while increasing the probability of success. The diagnostic becomes a bioinformatics exercise that allows the drug discovery company to optimize the definition of responders. Results: The result of the digital gene expression experiment was compared to RNA-seq and Affymetrix whole genome array. Conclusions: The platform offers a more cost-effective way of providing an open and hypothesis free platform compared to sequencing. The platform also offers cost-effective open-ended solution for such diverse applications as expression profiling, genome wide methylation profiling, genome wide duplication/deletion analysis, environmental microbiome profiling, and single gene mutation detection with only relatively minor adjustments in how the single stranded tags are extracted from the sample.

Download Full-text

Genomic prediction using low-coverage portable Nanopore sequencing

PLoS ONE ◽

10.1371/journal.pone.0261274 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261274

Author(s):

Harrison J. Lamb ◽

Ben J. Hayes ◽

Imtiaz A. S. Randhawa ◽

Loan T. Nguyen ◽

Elizabeth M. Ross

Keyword(s):

Genomic Prediction ◽

Disease Risk ◽

Sequence Data ◽

Snp Array ◽

Genotyping By Sequencing ◽

Risk Scores ◽

Genomic Breeding ◽

Breeding Values ◽

Low Coverage ◽

On Farm

Most traits in livestock, crops and humans are polygenic, that is, a large number of loci contribute to genetic variation. Effects at these loci lie along a continuum ranging from common low-effect to rare high-effect variants that cumulatively contribute to the overall phenotype. Statistical methods to calculate the effect of these loci have been developed and can be used to predict phenotypes in new individuals. In agriculture, these methods are used to select superior individuals using genomic breeding values; in humans these methods are used to quantitatively measure an individual’s disease risk, termed polygenic risk scores. Both fields typically use SNP array genotypes for the analysis. Recently, genotyping-by-sequencing has become popular, due to lower cost and greater genome coverage (including structural variants). Oxford Nanopore Technologies’ (ONT) portable sequencers have the potential to combine the benefits genotyping-by-sequencing with portability and decreased turn-around time. This introduces the potential for in-house clinical genetic disease risk screening in humans or calculating genomic breeding values on-farm in agriculture. Here we demonstrate the potential of the later by calculating genomic breeding values for four traits in cattle using low-coverage ONT sequence data and comparing these breeding values to breeding values calculated from SNP arrays. At sequencing coverages between 2X and 4X the correlation between ONT breeding values and SNP array-based breeding values was > 0.92 when imputation was used and > 0.88 when no imputation was used. With an average sequencing coverage of 0.5x the correlation between the two methods was between 0.85 and 0.92 using imputation, depending on the trait. This suggests that ONT sequencing has potential for in clinic or on-farm genomic prediction, however, further work to validate these findings in a larger population still remains.

Download Full-text

Potential of Low-Coverage Genotyping-by-Sequencing and Imputation for Cost-Effective Genomic Selection in Biparental Segregating Populations

Crop Science ◽

10.2135/cropsci2016.08.0675 ◽

2017 ◽

Vol 57 (3) ◽

pp. 1404-1420 ◽

Cited By ~ 32

Author(s):

Gregor Gorjanc ◽

Jean-Francois Dumasy ◽

Serap Gonen ◽

R. Chris Gaynor ◽

Roberto Antolin ◽

...

Keyword(s):

Genomic Selection ◽

Genotyping By Sequencing ◽

Cost Effective ◽

Low Coverage

Download Full-text

In-situ genomic prediction using low-coverage Nanopore sequencing

10.1101/2021.07.16.452615 ◽

2021 ◽

Author(s):

Harry Lamb ◽

Ben Hayes ◽

Imtiaz Randhawa ◽

Loan Nguyen ◽

Elizabeth Ross

Keyword(s):

Genomic Prediction ◽

Disease Risk ◽

Sequence Data ◽

Snp Array ◽

Genotyping By Sequencing ◽

Risk Scores ◽

Genomic Breeding ◽

Breeding Values ◽

Low Coverage ◽

On Farm

Most traits in livestock, crops and humans are polygenic, that is, a large number of loci contribute to genetic variation. Effects at these loci lie along a continuum ranging from common low-effect to rare high-effect variants that cumulatively contribute to the overall phenotype. Statistical methods to calculate the effect of these loci have been developed and can be used to predict phenotypes in new individuals. In agriculture, these methods are used to select superior individuals using genomic breeding values; in humans these methods are used to quantitatively measure an individual’s disease risk, termed polygenic risk scores. Both fields typically use SNP array genotypes for the analysis. Recently, genotyping-by-sequencing has become popular, due to lower cost and greater genome coverage (including structural variants). Oxford Nanopore Technologies’ (ONT) portable sequencers have the potential to combine the benefits genotyping-by-sequencing with portability and decreased turn-around time. This introduces the potential for in-house clinical genetic disease risk screening in humans or calculating genomic breeding values on-farm in agriculture. Here we demonstrate the potential of the later by calculating genomic breeding values for four traits in cattle using low-coverage ONT sequence data and comparing these breeding values to breeding values calculated from SNP arrays. At sequencing coverages between 2X and 4X the correlation between ONT breeding values and SNP array-based breeding values was > 0.92 when imputation was used and > 0.88 when no imputation was used. With an average sequencing coverage of 0.5x the correlation between the two methods was between 0.85 and 0.92 using imputation, depending on the trait. This demonstrates that ONT sequencing has great potential for in clinic or on-farm genomic prediction.

Download Full-text

Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping-by-sequencing

Genetics Selection Evolution ◽

10.1186/s12711-017-0311-8 ◽

2017 ◽

Vol 49 (1) ◽

Cited By ~ 14

Author(s):

Cheng Tan ◽

Zhenfang Wu ◽

Jiangli Ren ◽

Zhuolin Huang ◽

Dewu Liu ◽

...

Keyword(s):

Association Study ◽

Genomic Prediction ◽

Genome Wide Association Study ◽

Genotyping By Sequencing ◽

Genome Wide Association ◽

Genome Wide ◽

Teat Number

Download Full-text

Genome-Wide Identification of 5-Methylcytosine Sites in Bacterial Genomes By High-Throughput Sequencing of MspJI Restriction Fragments

10.1101/2021.02.10.430591 ◽

2021 ◽

Author(s):

Brian P. Anton ◽

Alexey Fomenkov ◽

Victoria Wu ◽

Richard J. Roberts

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

Specific Sequence ◽

Genome Wide ◽

Cost Effective Alternative ◽

Simple Column ◽

Sequencing Platforms

ABSTRACTSingle-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

Genome-wide identification of 5-methylcytosine sites in bacterial genomes by high-throughput sequencing of MspJI restriction fragments

PLoS ONE ◽

10.1371/journal.pone.0247541 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0247541

Author(s):

Brian P. Anton ◽

Alexey Fomenkov ◽

Victoria Wu ◽

Richard J. Roberts

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

Specific Sequence ◽

Genome Wide ◽

Cost Effective Alternative ◽

Simple Column ◽

Sequencing Platforms

Single-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

0326 Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping by sequencing

Journal of Animal Science ◽

10.2527/jam2016-0326 ◽

2016 ◽

Vol 94 (suppl_5) ◽

pp. 156-157

Author(s):

C. Tan ◽

Y. Da ◽

Z. Wu ◽

D. Liu ◽

X. He ◽

...

Keyword(s):

Association Study ◽

Genomic Prediction ◽

Genome Wide Association Study ◽

Genotyping By Sequencing ◽

Genome Wide Association ◽

Genome Wide ◽

Teat Number

Download Full-text

Haplotype genomic prediction of phenotypic values based on chromosome distance and gene boundaries using low-coverage sequencing in Duroc pigs

Genetics Selection Evolution ◽

10.1186/s12711-021-00661-y ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Cheng Bian ◽

Dzianis Prakapenka ◽

Cheng Tan ◽

Ruifei Yang ◽

Di Zhu ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Average Daily Gain ◽

Live Weight ◽

Feed Conversion ◽

Muscle Area ◽

Haplotype Blocks ◽

Low Coverage

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.

Download Full-text

Genotyping-by-sequencing enables linkage mapping in three octoploid cultivated strawberry families

10.7287/peerj.preprints.2975v1 ◽

2017 ◽

Author(s):

Kelly J Vining ◽

Natalia Salinas ◽

Jacob A Tennessen ◽

Jason D Zurn ◽

Daniel James Sargent ◽

...

Keyword(s):

Reference Genome ◽

Sequence Data ◽

Genotyping By Sequencing ◽

Nucleotide Polymorphisms ◽

Linkage Groups ◽

Single Nucleotide ◽

Ancestral Species ◽

Polymorphic Snps ◽

Genome Wide ◽

Diploid Ancestor

With the goal of evaluating genotyping-by-sequencing (GBS) in a species with a complex octoploid genome, GBS was used to survey genome-wide single-nucleotide polymorphisms (SNPs) in three biparental strawberry (Fragaria ×ananassa) populations. GBS sequence data were aligned to the F. vesca ‘Fvb’ reference genome in order to call SNPs. Numbers of polymorphic SNPs per population ranged from 1,163 to 3,190. Linkage maps consisting of 30-65 linkage groups were produced from the SNP sets derived from each parent. The linkage groups covered 99% of the Fvb reference genome, with three to seven linkage groups from a given parent aligned to any particular chromosome. A phylogenetic analysis performed using the POLiMAPS pipeline revealed linkage groups that were most similar to ancestral species F. vesca for each chromosome. Linkage groups that were most similar to a second ancestral species, F. iinumae, were only resolved for Fvb 4. The quantity of missing data and heterogeneity in genome coverage inherent in GBS complicated the analysis, but POLiMAPS resolved F. ×ananassa chromosomal regions derived from diploid ancestor F. vesca.

Download Full-text