scholarly journals Virtual Genome Walking: Generating gene models for the salamander Ambystoma mexicanum

2017 ◽  
Author(s):  
Teri Evans ◽  
Andrew Johnson ◽  
Matt Loose

AbstractLarge repeat rich genomes present challenges for assembly and identification of gene models with short read technologies. Here we present a method we call Virtual Genome Walking which uses an iterative assembly approach to first identify exons from de-novo assembled transcripts and assemble whole genome reads against each exon. This process is iterated allowing the extension of exons. These linked assemblies are refined to generate gene models including upstream and downstream genomic sequence as well as intronic sequence. We test this method using a 20X genomic read set for the axolotl, the genome of which is estimated to be 30 Gb in size. These reads were previously reported to be effectively impossible to assemble. Here we provide almost 1 Gb of assembled sequence describing over 19,000 gene models for the axolotl. Gene models stop assembling either due to localised low coverage in the genomic reads, or the presence of repeats. We validate our observations by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. In addition we analysed axolotl intron length, intron-exon structure, repeat content and synteny. These gene-models, sequences and annotations are freely available for download from https://tinyurl.com/y8gydc6n. The software pipeline including a docker image is available from https://github.com/LooseLab/iterassemble. These methods will increase the value of low coverage sequencing of understudied model systems.

2020 ◽  
Vol 12 (7) ◽  
pp. 1180-1193
Author(s):  
Abhijeet Shah ◽  
Joseph I Hoffman ◽  
Holger Schielzeth

Abstract Eukaryotic organisms vary widely in genome size and much of this variation can be explained by differences in the abundance of repetitive elements. However, the phylogenetic distributions and turnover rates of repetitive elements are largely unknown, particularly for species with large genomes. We therefore used de novo repeat identification based on low coverage whole-genome sequencing to characterize the repeatomes of six species of gomphocerine grasshoppers, an insect clade characterized by unusually large and variable genome sizes. Genome sizes of the six species ranged from 8.4 to 14.0 pg DNA per haploid genome and thus include the second largest insect genome documented so far (with the largest being another acridid grasshopper). Estimated repeat content ranged from 79% to 96% and was strongly correlated with genome size. Averaged over species, these grasshopper repeatomes comprised significant amounts of DNA transposons (24%), LINE elements (21%), helitrons (13%), LTR retrotransposons (12%), and satellite DNA (8.5%). The contribution of satellite DNA was particularly variable (ranging from <1% to 33%) as was the contribution of helitrons (ranging from 7% to 20%). The age distribution of divergence within clusters was unimodal with peaks ∼4–6%. The phylogenetic distribution of repetitive elements was suggestive of an expansion of satellite DNA in the lineages leading to the two species with the largest genomes. Although speculative at this stage, we suggest that the expansion of satellite DNA could be secondary and might possibly have been favored by selection as a means of stabilizing greatly expanded genomes.


Genome ◽  
2013 ◽  
Vol 56 (9) ◽  
pp. 487-494 ◽  
Author(s):  
Kate L. Hertweck

The research field of comparative genomics is moving from a focus on genes to a more holistic view including the repetitive complement. This study aimed to characterize relative proportions of the repetitive fraction of large, complex genomes in a nonmodel system. The monocotyledonous plant order Asparagales (onion, asparagus, agave) comprises some of the largest angiosperm genomes and represents variation in both genome size and structure (karyotype). Anonymous, low coverage, single-end Illumina data from 11 exemplar Asparagales taxa were assembled using a de novo method. Resulting contigs were annotated using a reference library of available monocot repetitive sequences. Mapping reads to contigs provided rough estimates of relative proportions of each type of transposon in the nuclear genome. The results were parsed into general repeat types and synthesized with genome size estimates and a phylogenetic context to describe the pattern of transposable element evolution among these lineages. The major finding is that although some lineages in Asparagales exhibit conservation in repeat proportions, there is generally wide variation in types and frequency of repeats. This approach is an appropriate first step in characterizing repeats in evolutionary lineages with a paucity of genomic resources.


2018 ◽  
Author(s):  
Sarah Ramirez-Busby ◽  
Afif Elghraoui ◽  
Yeon Bin Kim ◽  
Kellie Kim ◽  
Faramarz Valafar

AbstractMotivationSingle Molecule Real-Time (SMRT) sequencing has important and underutilized advantages that amplification-based platforms lack. Lack of systematic error (e.g. GC-bias), completede novoassembly (including large repetitive regions) without scaffolding, can be mentioned. SMRT sequencing, however suffers from high random error rate and low sequencing depth (older chemistries). Here, we introduce PBHoover, software that uses a heuristic calling algorithm in order to make base calls with high certainty in low coverage regions. This software is also capable of mixed population detection with high sensitivity. PBHoover’s CigarRoller attachment improves sequencing depth in low-coverage regions through CIGAR-string correction.ResultsWe tested both modules on 348M.tuberculosisclinical isolates sequenced on C1 or C2 chemistries. On average, CigarRoller improved percentage of usable read count from 68.9% to 99.98% in C1 runs and from 50% to 99% in C2 runs. Using the greater depth provided by CigarRoller, PBHoover was able to make base and variant calls 99.95% concordant with Sanger calls (QV33). PBHoover also detected antibiotic-resistant subpopulations that went undetected by Sanger. Using C1 chemistry, subpopulations as small as 9% of the total colony can be detected by PBHoover. This provides the most sensitive amplification-free molecular method for heterogeneity analysis and is in line with phenotypic methods’ sensitivity. This sensitivity significantly improves with the greater depth and lower error rate of the newer chemistries.Availability and ImplementationExecutables are freely available under GNU GPL v3+ athttp://www.gitlab.com/LPCDRP/pbhooverandhttp://www.gitlab.com/LPCDRP/CigarRoller. PBHoover is also available on bioconda:https://anaconda.org/bioconda/[email protected]


2016 ◽  
Vol 65 (2) ◽  
pp. 74-79 ◽  
Author(s):  
Malte Mader ◽  
Marie-Christine Le Paslier ◽  
Rémi Bounon ◽  
Aurélie Bérard ◽  
Patricia Faivre Rampant ◽  
...  

Abstract Populus trichocarpa and P. deltoides are the only Populus species known to date to have a publicly available nuclear genome sequence that has been assembled to chromosomes and annotated (https://phytozome.jgi.doe.gov/). Here we focus on the clone INRA 717-1B4, a female P. tremula x P. alba (P. x canescens) interspecific hybrid that is universally used by scientists worldwide as a tree model in transgenic experiments. The already available INRA 717-1B4 nuclear genomic resource (v1.1 of sPta717 at http://aspendb.uga.edu/index.php/databases/spta-717-genome) presents only INRA 717-1B4 genomic regions with high similarity to the P. trichocarpa genomic reference sequences. We assembled draft genomic scaffolds by a combination of de novo assembly with reference-based assembly using 30x resequencing NGS data (Illumina MiSeq® and Ion Torrent Ion PGM™) of INRA 717-1B4. In total, 419,969 scaffolds of length larger than 500 bp were generated. The mean length of the scaffolds is 2,166 bp and the size of the largest scaffold 84,573 bp. The N50 contig length is 3,850 bp when considering contigs larger than 1,000 bp. Probably due to the high level of heterozygosity of this interspecific hybrid, the accumulated scaffold length is with 0.9 GB about twice the expected size of the haploid nuclear genome. DNA sequences of the genomic scaffolds of INRA 717-1B4 are publicly available for Blast analyses and download via the new INRA web portal at https://urgi.versailles.inra.fr/Species/Forest-trees/Populus/Clone-INRA-717-1B4/. This new genomic sequence resource will complement the already available INRA 717-1B4 resources and will facilitate the future optimization of genetic transformation experiments to discover gene function.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Víctor Faundes ◽  
Martin D. Jennings ◽  
Siobhan Crilly ◽  
Sarah Legraie ◽  
Sarah E. Withers ◽  
...  

AbstractThe structure of proline prevents it from adopting an optimal position for rapid protein synthesis. Poly-proline-tract (PPT) associated ribosomal stalling is resolved by highly conserved eIF5A, the only protein to contain the amino acid hypusine. We show that de novo heterozygous EIF5A variants cause a disorder characterized by variable combinations of developmental delay, microcephaly, micrognathia and dysmorphism. Yeast growth assays, polysome profiling, total/hypusinated eIF5A levels and PPT-reporters studies reveal that the variants impair eIF5A function, reduce eIF5A-ribosome interactions and impair the synthesis of PPT-containing proteins. Supplementation with 1 mM spermidine partially corrects the yeast growth defects, improves the polysome profiles and restores expression of PPT reporters. In zebrafish, knockdown eif5a partly recapitulates the human phenotype that can be rescued with 1 µM spermidine supplementation. In summary, we uncover the role of eIF5A in human development and disease, demonstrate the mechanistic complexity of EIF5A-related disorder and raise possibilities for its treatment.


2020 ◽  
Vol 111 (5) ◽  
pp. 419-428 ◽  
Author(s):  
Marcella D Baiz ◽  
Priscilla K Tucker ◽  
Jacob L Mueller ◽  
Liliana Cortés-Ortiz

Abstract Reproductive isolation is a fundamental step in speciation. While sex chromosomes have been linked to reproductive isolation in many model systems, including hominids, genetic studies of the contribution of sex chromosome loci to speciation for natural populations are relatively sparse. Natural hybrid zones can help identify genomic regions contributing to reproductive isolation, like hybrid incompatibility loci, since these regions exhibit reduced introgression between parental species. Here, we use a primate hybrid zone (Alouatta palliata × Alouatta pigra) to test for reduced introgression of X-linked SNPs compared to autosomal SNPs. To identify X-linked sequence in A. palliata, we used a sex-biased mapping approach with whole-genome re-sequencing data. We then used genomic cline analysis with reduced-representation sequence data for parental A. palliata and A. pigra individuals and hybrids (n = 88) to identify regions with non-neutral introgression. We identified ~26 Mb of non-repetitive, putatively X-linked genomic sequence in A. palliata, most of which mapped collinearly to the marmoset and human X chromosomes. We found that X-linked SNPs had reduced introgression and an excess of ancestry from A. palliata as compared to autosomal SNPs. One outlier region with reduced introgression overlaps a previously described “desert” of archaic hominin ancestry on the human X chromosome. These results are consistent with a large role for the X chromosome in speciation across animal taxa and further, suggest shared features in the genomic basis of the evolution of reproductive isolation in primates.


2014 ◽  
Vol 64 (Pt_2) ◽  
pp. 316-324 ◽  
Author(s):  
Jongsik Chun ◽  
Fred A. Rainey

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi206-vi206
Author(s):  
Tomohiro Yamasaki ◽  
Lumin Zhang ◽  
Tyrone Dowdy ◽  
Adrian Lita ◽  
Mark Gilbert ◽  
...  

Abstract BACKGROUND Increased de novo lipogenesis is a hallmark of cancer metabolism. In this study, we interrogated the role of de novo lipogenesis in IDH1 mutated glioma’s growth and identified the key enzyme, Stearoyl-CoA desaturase 1 (SCD1) that provides this growth advantage. MATERIALS ANDMETHODS We prepared genetically engineered glioma cell lines (U251 wild-type: U251WT and U251 IDHR132H mutant: U251RH) and normal human astrocytes (empty vector induced-NHA: NHAEV and IDHR132H mutant: NHARH). Lipid metabolic analysis was conducted by using LC-MS and Raman imaging microscopy. SCD1 expression was investigated by The Cancer Genome Atlas (TCGA) data analysis and Western-blotting method. Knock-out of SCD1 was conducted by using CRISPR/Cas9 and shRNA. RESULTS Previously, we showed that IDH1 mut glioma cells have increased monounsaturated fatty acids (MUFAs). TCGA data revealed IDH mut glioma shows significantly higher SCD1 mRNA expression than wild-type glioma. Our model systems of IDH1 mut (U251RH, NHARH) showed increased expression of this enzyme compared with their wild-type counterpart. Moreover, addition of D-2HG to U251WT increased SCD1 expression. Herein, we showed that inhibition of SCD1 with CAY10566 decreased relative cell number and sphere forming capacity in a dose-dependent manner. Furthermore, addition of MUFAs were able to rescue the SCD1 inhibitor induced-cell death and sphere forming capacity. Knock out of SCD1 revealed decreased cell proliferation and sphere forming ability. Decreasing lipid content from the media did not alter the growth of these cells, suggesting that glioma cells rely on de novo lipid synthesis rather than scavenging them from the microenvironment. CONCLUSION Overexpression of IDH mutant gene altered lipid composition in U251 cells to enrich MUFA levels and we confirmed that D-2HG caused SCD1 upregulation in U251WT. We demonstrated the glioma cell growth requires SCD1 expression and the results of the present study may provide novel insights into the role of SCD1 in IDH mut gliomas growth.


Sign in / Sign up

Export Citation Format

Share Document