Shifting the limits in wheat research and breeding using a fully annotated reference genome

; Rudi Appels; Kellye Eversole; Nils Stein; Catherine Feuillet; Beat Keller; Jane Rogers; Curtis J. Pozniak; Frédéric Choulet; Assaf Distelfeld; Jesse Poland; Gil Ronen; Andrew G. Sharpe; Omer Barad; Kobi Baruch; Gabriel Keeble-Gagnère; Martin Mascher; Gil Ben-Zvi; Ambre-Aurore Josselin; Axel Himmelbach; François Balfourier; Juan Gutierrez-Gonzalez; Matthew Hayden; ChuShin Koh; Gary Muehlbauer; Raj K. Pasam; Etienne Paux; Philippe Rigault; Josquin Tibbits; Vijay Tiwari; Manuel Spannagl; Daniel Lang; Heidrun Gundlach; Georg Haberer; Klaus F. X. Mayer; Danara Ormanbekova; Verena Prade; Hana Šimková; Thomas Wicker; David Swarbreck; Hélène Rimbert; Marius Felder; Nicolas Guilhot; Gemy Kaithakottil; Jens Keilwagen; Philippe Leroy; Thomas Lux; Sven Twardziok; Luca Venturini; Angéla Juhász; Michael Abrouk; Iris Fischer; Cristobal Uauy; Philippa Borrill; Ricardo H. Ramirez-Gonzalez; Dominique Arnaud; Smahane Chalabi; Boulos Chalhoub; Aron Cory; Raju Datla; Mark W. Davey; John Jacobs; Stephen J. Robinson; Burkhard Steuernagel; Fred van Ex; Brande B. H. Wulff; Moussa Benhamed; Abdelhafid Bendahmane; Lorenzo Concia; David Latrasse; Jan Bartoš; Arnaud Bellec; Hélène Berges; Jaroslav Doležel; Zeev Frenkel; Bikram Gill; Abraham Korol; Thomas Letellier; Odd-Arne Olsen; Kuldeep Singh; Miroslav Valárik; Edwin van der Vossen; Sonia Vautrin; Song Weining; Tzion Fahima; Vladimir Glikson; Dina Raats; Jarmila Číhalíková; Helena Toegelová; Jan Vrána; Pierre Sourdille; Benoit Darrier; Delfina Barabaschi; Luigi Cattivelli; Pilar Hernandez; Sergio Galvez; Hikmet Budak; Jonathan D. G. Jones; Kamil Witek; Guotai Yu; Ian Small; Joanna Melonek; Ruonan Zhou; Tatiana Belova; Kostya Kanyuka; Robert King; Kirby Nilsen; Sean Walkowiak; Richard Cuthbert; Ron Knox; Krysta Wiebe; Daoquan Xiang; Antje Rohde; Timothy Golds; Jana Čížková; Bala Ani Akpinar; Sezgi Biyiklioglu; Liangliang Gao; Amidou N’Daiye; Marie Kubaláková; Jan Šafář; Françoise Alfama; Anne-Françoise Adam-Blondon; Raphael Flores; Claire Guerche; Mikaël Loaec; Hadi Quesneville; Janet Condie; Jennifer Ens; Ron Maclachlan; Yifang Tan; Adriana Alberti; Jean-Marc Aury; Valérie Barbe; Arnaud Couloux; Corinne Cruaud; Karine Labadie; Sophie Mangenot; Patrick Wincker; Gaganpreet Kaur; Mingcheng Luo; Sunish Sehgal; Parveen Chhuneja; Om Prakash Gupta; Suruchi Jindal; Parampreet Kaur; Palvi Malik; Priti Sharma; Bharat Yadav; Nagendra K. Singh; Jitendra P. Khurana; Chanderkant Chaudhary; Paramjit Khurana; Vinod Kumar; Ajay Mahato; Saloni Mathur; Amitha Sevanthi; Naveen Sharma; Ram Sewak Tomar; Kateřina Holušová; Ondřej Plíhal; Matthew D. Clark; Darren Heavens; George Kettleborough; Jon Wright; Barbora Balcárková; Yuqin Hu; Elena Salina; Nikolai Ravin; Konstantin Skryabin; Alexey Beletsky; Vitaly Kadnikov; Andrey Mardanov; Michail Nesterov; Andrey Rakitin; Ekaterina Sergeeva; Hirokazu Handa; Hiroyuki Kanamori; Satoshi Katagiri; Fuminori Kobayashi; Shuhei Nasuda; Tsuyoshi Tanaka; Jianzhong Wu; Federica Cattonaro; Min Jiumeng; Karl Kugler; Matthias Pfeifer; Simen Sandve; Xu Xun; Bujie Zhan; Jacqueline Batley; Philipp E. Bayer; David Edwards; Satomi Hayashi; Zuzana Tulpová; Paul Visendi; Licao Cui; Xianghong Du; Kewei Feng; Xiaojun Nie; Wei Tong; Le Wang

doi:10.1126/science.aar7191

Shifting the limits in wheat research and breeding using a fully annotated reference genome

Science ◽

10.1126/science.aar7191 ◽

2018 ◽

Vol 361 (6403) ◽

pp. eaar7191 ◽

Cited By ~ 717

Author(s):

◽

Rudi Appels ◽

Kellye Eversole ◽

Nils Stein ◽

Catherine Feuillet ◽

...

Keyword(s):

Reference Genome ◽

Single Gene ◽

Gene Families ◽

Reference Sequence ◽

Genomic Context ◽

High Confidence ◽

Wheat Development ◽

End Use ◽

Coexpression Networks ◽

Bread Wheat Genome

An annotated reference sequence representing the hexaploid bread wheat genome in 21 pseudomolecules has been analyzed to identify the distribution and genomic context of coding and noncoding elements across the A, B, and D subgenomes. With an estimated coverage of 94% of the genome and containing 107,891 high-confidence gene models, this assembly enabled the discovery of tissue- and developmental stage–related coexpression networks by providing a transcriptome atlas representing major stages of wheat development. Dynamics of complex gene families involved in environmental adaptation and end-use quality were revealed at subgenome resolution and contextualized to known agronomic single-gene or quantitative trait loci. This community resource establishes the foundation for accelerating wheat research and application through improved understanding of wheat biology and genomics-assisted breeding.

Download Full-text

Haplotype-resolved genome of diploid ginger (Zingiber officinale) and its unique gingerol biosynthetic pathway

Horticulture Research ◽

10.1038/s41438-021-00627-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hong-Lei Li ◽

Lin Wu ◽

Zhaoming Dong ◽

Yusong Jiang ◽

Sanjie Jiang ◽

...

Keyword(s):

Biosynthetic Pathway ◽

Southwest China ◽

Reference Genome ◽

Zingiber Officinale ◽

Gene Families ◽

Chromosome Conformation ◽

Long Reads ◽

Transcription Factor Networks ◽

Species Specific ◽

Haplotype 1

AbstractGinger (Zingiber officinale), the type species of Zingiberaceae, is one of the most widespread medicinal plants and spices. Here, we report a high-quality, chromosome-scale reference genome of ginger ‘Zhugen’, a traditionally cultivated ginger in Southwest China used as a fresh vegetable, assembled from PacBio long reads, Illumina short reads, and high-throughput chromosome conformation capture (Hi-C) reads. The ginger genome was phased into two haplotypes, haplotype 1 (1.53 Gb with a contig N50 of 4.68 M) and haplotype 0 (1.51 Gb with a contig N50 of 5.28 M). Homologous ginger chromosomes maintained excellent gene pair collinearity. In 17,226 pairs of allelic genes, 11.9% exhibited differential expression between alleles. Based on the results of ginger genome sequencing, transcriptome analysis, and metabolomic analysis, we proposed a backbone biosynthetic pathway of gingerol analogs, which consists of 12 enzymatic gene families, PAL, C4H, 4CL, CST, C3’H, C3OMT, CCOMT, CSE, PKS, AOR, DHN, and DHT. These analyses also identified the likely transcription factor networks that regulate the synthesis of gingerol analogs. Overall, this study serves as an excellent resource for further research on ginger biology and breeding, lays a foundation for a better understanding of ginger evolution, and presents an intact biosynthetic pathway for species-specific gingerol biosynthesis.

Download Full-text

Systematic Detection of Large-Scale Multi-Gene Horizontal Transfer in Prokaryotes

Molecular Biology and Evolution ◽

10.1093/molbev/msab043 ◽

2021 ◽

Author(s):

Lina Kloub ◽

Sean Gosselin ◽

Matthew Fullmer ◽

Joerg Graf ◽

J Peter Gogarten ◽

...

Keyword(s):

Gene Transfer ◽

Large Scale ◽

Single Gene ◽

Gene Families ◽

Microbial Evolution ◽

Phylogenetic Distance ◽

Secretion Systems ◽

Type Iii Secretion Systems ◽

A Genome ◽

Conserved Gene

Abstract Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the “scale” of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multi-gene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale dataset of over 22000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multi-gene transfer. Among other insights, we find that (i) the observed relative frequency of HMGT increases as divergence between genomes increases, (ii) HMGTs often have conserved gene functions, and (iii) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.

Download Full-text

RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes

International Journal of Genomics ◽

10.1155/2015/563482 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Krisztian Buza ◽

Bartek Wilczynski ◽

Norbert Dojer

Keyword(s):

Reference Genome ◽

De Novo ◽

Real Data ◽

Reference Sequence ◽

Individual Genome ◽

Single Experiment ◽

Sequencing Technologies ◽

Sequencing Cost ◽

The Individual ◽

Assembly Software

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used.Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge.Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.

Download Full-text

The genomic context of natural killer receptor extended gene families

Immunological Reviews ◽

10.1034/j.1600-065x.2001.1810102.x ◽

2001 ◽

Vol 181 (1) ◽

pp. 20-38 ◽

Cited By ~ 213

Author(s):

John Trowsdale ◽

Roland Barten ◽

Anja Haude ◽

C. Andrew Stewart ◽

Stephan Beck ◽

...

Keyword(s):

Natural Killer ◽

Gene Families ◽

Genomic Context ◽

Natural Killer Receptor

Download Full-text

Promoter-mediated diversification of transcriptional bursting dynamics following gene duplication

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800943115 ◽

2018 ◽

Vol 115 (33) ◽

pp. 8364-8369 ◽

Cited By ~ 17

Author(s):

Edward Tunnacliffe ◽

Adam M. Corrigan ◽

Jonathan R. Chubb

Keyword(s):

Gene Duplication ◽

Expression Profiles ◽

Single Cells ◽

Family Members ◽

Gene Families ◽

Developmental Expression ◽

Upstream Sequence ◽

Genomic Context ◽

Entire Family ◽

Transcriptional Bursting

During the evolution of gene families, functional diversification of proteins often follows gene duplication. However, many gene families expand while preserving protein sequence. Why do cells maintain multiple copies of the same gene? Here we have addressed this question for an actin family with 17 genes encoding an identical protein. The genes have divergent flanking regions and are scattered throughout the genome. Surprisingly, almost the entire family showed similar developmental expression profiles, with their expression also strongly coupled in single cells. Using live cell imaging, we show that differences in gene expression were apparent over shorter timescales, with family members displaying different transcriptional bursting dynamics. Strong “bursty” behaviors contrasted steady, more continuous activity, indicating different regulatory inputs to individual actin genes. To determine the sources of these different dynamic behaviors, we reciprocally exchanged the upstream regulatory regions of gene family members. This revealed that dynamic transcriptional behavior is directly instructed by upstream sequence, rather than features specific to genomic context. A residual minor contribution of genomic context modulates the gene OFF rate. Our data suggest promoter diversification following gene duplication could expand the range of stimuli that regulate the expression of essential genes. These observations contextualize the significance of transcriptional bursting.

Download Full-text

The chromosome-scale reference genome of black pepper provides insight into piperine biosynthesis

Nature Communications ◽

10.1038/s41467-019-12607-6 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 10

Author(s):

Lisong Hu ◽

Zhongping Xu ◽

Maojun Wang ◽

Rui Fan ◽

Daojun Yuan ◽

...

Keyword(s):

Reference Genome ◽

Gene Families ◽

Black Pepper ◽

Piper Nigrum ◽

Comparative Genomic ◽

Specific Gene ◽

Phylogenomic Analysis ◽

Genomic Analyses ◽

Species Specific ◽

Insight Into

Abstract Black pepper (Piper nigrum), dubbed the ‘King of Spices’ and ‘Black Gold’, is one of the most widely used spices. Here, we present its reference genome assembly by integrating PacBio, 10x Chromium, BioNano DLS optical mapping, and Hi-C mapping technologies. The 761.2 Mb sequences (45 scaffolds with an N50 of 29.8 Mb) are assembled into 26 pseudochromosomes. A phylogenomic analysis of representative plant genomes places magnoliids as sister to the monocots-eudicots clade and indicates that black pepper has diverged from the shared Laurales-Magnoliales lineage approximately 180 million years ago. Comparative genomic analyses reveal specific gene expansions in the glycosyltransferase, cytochrome P450, shikimate hydroxycinnamoyl transferase, lysine decarboxylase, and acyltransferase gene families. Comparative transcriptomic analyses disclose berry-specific upregulated expression in representative genes in each of these gene families. These data provide an evolutionary perspective and shed light on the metabolic processes relevant to the molecular basis of species-specific piperine biosynthesis.

Download Full-text

Towards the Human Cancer Genome Project: A Sequence-Ready Physical Map of a Follicular Lymphoma Genome.

Blood ◽

10.1182/blood.v106.11.605.605 ◽

2005 ◽

Vol 106 (11) ◽

pp. 605-605

Author(s):

Marco A. Marra ◽

Martin Krzywinski ◽

Readman Chiu ◽

Matthew Field ◽

Inanc Birol ◽

...

Keyword(s):

Follicular Lymphoma ◽

Human Genome ◽

Large Scale ◽

Reference Genome ◽

Reference Sequence ◽

Whole Genome ◽

Bac Clones ◽

Genome Maps ◽

Tumor Genome ◽

Reference Human Genome

Abstract With the aim of identifying and sequencing mutations in follicular lymphoma genomes, we have begun a project to generate at least 24 deeply redundant sequence-ready Bacterial Artificial Clone (BAC) - based whole genome maps, each from a different individual’s lymphoma. BAC-array CGH and Affymetrix whole-genome sampling assays (WGSA) will be used along with the mapping data to identify genomic amplifications and losses in the lymphomas. Results from the mapping and array studies will be used to prioritize BAC clones for sequence analysis. Because each map will span essentially the entire genome of the corresponding lymphoma, we anticipate that essentially all regions of each tumor genome will be represented in easily sequenced BAC clones. This approach facilitates targeted sequencing of genomic regions of interest, including those containing genes relevant to cancer or harboring amplifications or deletions. Our mapping strategy hinges on the successful creation of deeply redundant high quality BAC libraries from primary lymphomas and large scale high throughput restriction enzyme fingerprinting of individual BACs with a version of the technology we used to map the human, mouse, rat and other genomes. The effort is large-scale, and will result in the generation of at least 2.5 million fingerprinted BAC clones over the next three years. Using the fingerprints, we will align the BACs to the reference human genome to assess genome coverage and to identify candidate genome rearrangements. In parallel, we will assemble the fingerprints into genome maps, looking for larger-scale genome variations between the lymphoma maps and the reference genome sequence. To test the feasibility of our approach, we obtained two restriction digest fingerprints from each of 140,000 individual BAC clones. BACs were sampled from a 7-fold redundant BAC library that had been created from genomic DNA purified from a primary follicular lymphoma sample. The fingerprints are being assembled into a clone map with the intent of reconstructing the entire tumor genome. 90,377 fingerprinted clones with unambiguous single alignments to the reference sequence were automatically assembled into 15,538 contigs. Subsequent rounds of semi-automatic contig merging further reduced the number of contigs to 5,433. Only 1,241 clones remained unassembled. We anchored the tumor genome map to the reference human genome sequence by aligning the clone fingerprints to the restriction map computed from the reference sequence assembly. As a result of this, we identified a BAC that captured the canonical t(14;18) translocation characteristic of follicular lymphomas. We sequenced this BAC and confirmed that it contains the expected translocation. Almost 2.6 gigabases (~91%) of the reference genome are represented in the evolving map, with an additional 50,000 clone fingerprints awaiting incorporation into the map assembly. Among these are repeat-rich and other clones that may well harbor genome rearrangements. Additional prioritization of sequencing targets will be undertaken when map construction and analysis of genome copy number alterations are complete.

Download Full-text

SMRT Genome Assembly Corrects Reference Errors, Resolving the Genetic Basis of Virulence in Mycobacterium tuberculosis

10.1101/064840 ◽

2016 ◽

Author(s):

Afif Elghraoui ◽

Samuel J Modlin ◽

Faramarz Valafar

Keyword(s):

Mycobacterium Tuberculosis ◽

Single Molecule ◽

Genetic Basis ◽

Reference Genome ◽

Reference Sequence ◽

Smrt Sequencing ◽

Virulence Attenuation ◽

Sequencing Platforms ◽

Genome Comparisons ◽

Reference Genomes

AbstractThe genetic basis of virulence in Mycobacterium tuberculosis has been investigated through genome comparisons of its virulent (H37Rv) and attenuated (H37Ra) sister strains. Such analysis, however, relies heavily on the accuracy of the sequences. While the H37Rv reference genome has had several corrections to date, that of H37Ra is unmodified since its original publication. Here, we report the assembly and finishing of the H37Ra genome from single-molecule, real-time (SMRT) sequencing. Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies. PE_PPE family genes, which are intractable to commonly-used sequencing platforms because of their repetitive and GC-rich nature, are overrepresented in the set of genes in which all reported H37Ra-specific variants are contradicted. We discuss how our results change the picture of virulence attenuation and the power of SMRT sequencing for producing high-quality reference genomes.

Download Full-text

Ares-GT: design of guide RNAs targeting multiple genes for CRISPR-Cas experiments

10.1101/2020.01.08.898742 ◽

2020 ◽

Author(s):

Eugenio G. Minguet

Keyword(s):

Reference Genome ◽

Query Sequence ◽

Gene Families ◽

Guide Rna ◽

Guide Rnas ◽

Online Tools ◽

Multiple Input ◽

Design Guide ◽

Command Line Tool ◽

Selection Of

ABSTRACTMotivationThere is a lack of tools to design guide RNA for CRISPR genome editing of gene families and usually good candidate sgRNAs are tagged with low scores precisely because they match several locations in the genome, thus time-consuming manual evaluation of targets is required. Moreover, online tools are limited to a restricted list of reference genome and lack the flexibility to incorporate unpublished genomes or contemplate genomes of populations with allelic variants.ResultsTo address these issues, I have developed the ARES-GT, a local command line tool in Python software. ARES-GT allows the selection of candidate sgRNAs that match multiple input query sequences, in addition of candidate sgRNAs that specifically match each query sequence. It also contemplates the use of unmapped contigs apart from complete genomes thus allowing the use of any genome provided by user and being able to handle intraspecies allelic variability and individual polymorphisms.AvailabilityARES-GT is available at GitHub (https://github.com/eugomin/ARES-GT.git).

Download Full-text

Predicting transfer RNA gene activity from sequence and genome context

10.1101/661942 ◽

2019 ◽

Author(s):

Bryan Thornlow ◽

Joel Armstrong ◽

Andrew Holmes ◽

Russell Corbett-Detig ◽

Todd Lowe

Keyword(s):

Gene Expression ◽

Trna Gene ◽

Gene Expression Regulation ◽

Gene Families ◽

Transfer Rna ◽

Gene Activity ◽

Comparative Genomic ◽

Trna Genes ◽

Genomic Context ◽

High Sequence Identity

ABSTRACTTransfer RNA (tRNA) genes are among the most highly transcribed genes in the genome due to their central role in protein synthesis. However, there is evidence for a broad range of gene expression across tRNA loci. This complexity, combined with difficulty in measuring transcript abundance and high sequence identity across transcripts, has severely limited our collective understanding of tRNA gene expression regulation and evolution. We establish sequence-based correlates to tRNA gene expression and develop a tRNA gene classification method that does not require, but benefits from comparative genomic information, and achieves accuracy comparable to molecular assays. We observe that guanine+cytosine (G+C) content and CpG density surrounding tRNA loci is exceptionally well correlated with tRNA gene activity, supporting a prominent regulatory role of the local genomic context in combination with internal sequence features. We use our tRNA gene activity predictions in conjunction with a comprehensive tRNA gene ortholog set spanning 29 placental mammals to infer the frequency of changes to tRNA gene expression among orthologs. Our method adds an important new dimension to tRNA annotation and will help focus the study of natural tRNA variants. Its simplicity and robustness enables facile application to other clades and timescales, as well as exploration of functional diversification of tRNAs and other large gene families.

Download Full-text