scholarly journals Superbubbles, Ultrabubbles and Cacti

2017 ◽  
Author(s):  
Benedict Paten ◽  
Adam M Novak ◽  
Erik Garrison ◽  
Glenn Hickey

AbstractA superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genetics, the possible paths through a superbubble can be considered to represent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation problems. Here we define snarls and ultrabubbles, generalizations of superbubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which we show encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural variations, without need for any single reference genome coordinate system. Furthermore, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats, e.g. VCF.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guifang Lin ◽  
Cheng He ◽  
Jun Zheng ◽  
Dal-Hoe Koo ◽  
Ha Le ◽  
...  

Abstract Background The maize inbred line A188 is an attractive model for elucidation of gene function and improvement due to its high embryogenic capacity and many contrasting traits to the first maize reference genome, B73, and other elite lines. The lack of a genome assembly of A188 limits its use as a model for functional studies. Results Here, we present a chromosome-level genome assembly of A188 using long reads and optical maps. Comparison of A188 with B73 using both whole-genome alignments and read depths from sequencing reads identify approximately 1.1 Gb of syntenic sequences as well as extensive structural variation, including a 1.8-Mb duplication containing the Gametophyte factor1 locus for unilateral cross-incompatibility, and six inversions of 0.7 Mb or greater. Increased copy number of carotenoid cleavage dioxygenase 1 (ccd1) in A188 is associated with elevated expression during seed development. High ccd1 expression in seeds together with low expression of yellow endosperm 1 (y1) reduces carotenoid accumulation, accounting for the white seed phenotype of A188. Furthermore, transcriptome and epigenome analyses reveal enhanced expression of defense pathways and altered DNA methylation patterns of the embryonic callus. Conclusions The A188 genome assembly provides a high-resolution sequence for a complex genome species and a foundational resource for analyses of genome variation and gene function in maize. The genome, in comparison to B73, contains extensive intra-species structural variations and other genetic differences. Expression and network analyses identify discrete profiles for embryonic callus and other tissues.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Qihua Liang ◽  
Stefano Lonardi

Abstract Background The pan-genome of a species is the union of the genes and non-coding sequences present in all individuals (cultivar, accessions, or strains) within that species. Results Here we introduce PGV, a reference-agnostic representation of the pan-genome of a species based on the notion of consensus ordering. Our experimental results demonstrate that PGV enables an intuitive, effective and interactive visualization of a pan-genome by providing a genome browser that can elucidate complex structural genomic variations. Conclusions The PGV software can be installed via conda or downloaded from https://github.com/ucrbioinfo/PGV. The companion PGV browser at http://pgv.cs.ucr.edu can be tested using example bed tracks available from the GitHub page.


2019 ◽  
Vol 35 (18) ◽  
pp. 3250-3256 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Bahar Alipanahi ◽  
Tamer Kahveci ◽  
Leena Salmela ◽  
Christina Boucher

Abstract Motivation Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps—called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. Results We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. Availability and implementation The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Toshiyuki T. Yokoyama ◽  
Yoshitaka Sakamoto ◽  
Masahide Seki ◽  
Yutaka Suzuki ◽  
Masahiro Kasahara

Abstract Background Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded. Results We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1. Conclusions Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time. Software availability MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.


Author(s):  
Seyoung Mun ◽  
Songmi Kim ◽  
Wooseok Lee ◽  
Keunsoo Kang ◽  
Thomas J. Meyer ◽  
...  

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.


Author(s):  
Guangtu Gao ◽  
Susana Magadan ◽  
Geoffrey C Waldbieser ◽  
Ramey C Youngblood ◽  
Paul A Wheeler ◽  
...  

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.


2004 ◽  
Vol 13 (01) ◽  
pp. 165-186 ◽  
Author(s):  
SIMONE MERCURI ◽  
GIOVANNI MONTANI

We present a new reformulation of the canonical quantum geometrodynamics, which allows one to overcome the fundamental problem of the frozen formalism and, therefore, to construct an appropriate Hilbert space associate to the solution of the restated dynamics. More precisely, to remove the ambiguity contained in the Wheeler–DeWitt approach, with respect to the possibility of a (3+1)-splitting when space–time is in a quantum regime, we fix the reference frame (i.e. the lapse function and the shift vector) by introducing the so-called kinematical action. As a consequence the new super-Hamiltonian constraint becomes a parabolic one and we arrive to a Schrödinger-like approach for the quantum dynamics. In the semiclassical limit our theory provides General Relativity in the presence of an additional energy–momentum density contribution coming from non-zero eigenvalues of the Hamiltonian constraints. The interpretation of these new contributions comes out in natural way that soon as it is recognized that the kinematical action can be recasted in such a way that it describes a pressureless, but, in general, non-geodesic perfect fluid.


Author(s):  
Xiaolin Zhao ◽  
Zhichao Zhang ◽  
Sujiao Zheng ◽  
Wenwu Ye ◽  
Xiaobo Zheng ◽  
...  

Diaporthe-Phomopsis disease complex causes considerable yield losses in soybean production worldwide. As one of the major pathogens, Phomopsis longicolla T. W. Hobbs (syn. Diaporthe longicolla) is not only the primary agent of Phomopsis seed decay, but also one of the agents of Phomopsis pod and stem blight, and Phomopsis stem canker. We performed both PacBio long read sequencing and Illumina short read sequencing, and obtained a genome assembly for the P. longicolla strain YC2-1, which was isolated from soybean stem with Phomopsis stem blight disease. The 63.1 Mb genome assembly contains 87 scaffolds, with a minimum, maximum, and N50 scaffold length of 20 kb, 4.6 Mb, and 1.5 Mb respectively, and a total of 17,407 protein-coding genes. The high-quality data expand the genomic resource of P. longicolla species and will provide a solid foundation for a better understanding of their genetic diversity and pathogenic mechanisms.


2021 ◽  
Vol 6 ◽  
pp. 258
Author(s):  
Konrad Lohse ◽  
Alexander Mackintosh ◽  
Roger Vila ◽  
◽  
◽  
...  

We present a genome assembly from an individual male Aglais io (also known as Inachis io and Nymphalis io) (the European peacock; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 384 megabases in span. The majority (99.91%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 11,420 protein coding genes.


2021 ◽  
Vol 6 ◽  
pp. 322
Author(s):  
Liam Crowley ◽  
◽  
◽  
◽  
◽  
...  

We present a genome assembly from an individual female Malachius bipustulatus (the common malachite beetle; Arthropoda; Insecta; Coleoptera; Melyridae). The genome sequence is 544 megabases in span. The majority (99.70%) of the assembly is scaffolded into 10 chromosomal pseudomolecules, with the X sex chromosome assembled.


Sign in / Sign up

Export Citation Format

Share Document