sequence contigs Latest Research Papers

Abstract Background Human subtelomeric DNA regulates the length and stability of adjacent telomeres that are critical for cellular function, and contains many gene/pseudogene families. Large evolutionarily recent segmental duplications and associated structural variation in human subtelomeres has made complete sequencing and assembly of these regions difficult to impossible for many loci, complicating or precluding a wide range of genetic analyses to investigate their function. Results We present a hybrid assembly method, NanoPore Guided REgional Assembly Tool (NPGREAT), which combines Linked-Read data with ultralong nanopore reads spanning subtelomeric segmental duplications to potentially overcome these difficulties. Linked-Read sets identified by matches with 1-copy subtelomere sequence adjacent to segmental duplications are assembled and extended into the segmental duplication regions using Regional Extension of Assemblies using Linked-Reads (REXTAL). Telomere-containing ultralong nanopore reads are then used to provide contiguity and correct orientation for matching REXTAL sequence contigs as well as identification/correction of any misassemblies (associated primarily with tandem repeats). While we focus on subtelomeres, the method is generally applicable to assembly of segmental duplications and other complex genome regions. Our method was tested for a subset of representative subtelomeres with ultralong nanopore read coverage in GM12878. 10X Linked-Read datasets with high depth of coverage and a TELL-seq Linked-Read dataset with lower depth of coverage were each combined with the ultralong nanopore reads from the same genome to provide improved assemblies. Tandem repeat regions of the short-read assemblies, which are especially prone to misassembly due to collapse of matching tandemly repeated reads, were readily identified and properly sized by comparison with the nanopore reads. Conclusion The NPGREAT method resulted in extension of high-quality assemblies into otherwise inaccessible segmental duplication regions near telomeres, enhancing our ability to accurately assemble human subtelomere DNA. This information will enable improved analyses of the structure, function, and evolution of these key regions.

Download Full-text

3CAC: improving the classification of phages and plasmids from metagenomic assemblies using assembly graphs

10.1101/2021.11.05.467408 ◽

2021 ◽

Author(s):

Lianrong Pu ◽

Ron Shamir

Keyword(s):

Microbial Communities ◽

Gut Microbiome ◽

Microbial Evolution ◽

Human Gut ◽

Sequence Contigs ◽

Percentage Points ◽

Bacterial Chromosomes ◽

High Fraction ◽

Very High

Bacteriophages and plasmids usually coexist with their host bacteria in microbial communities and play important roles in microbial evolution. Accurately identifying sequence contigs as phages, plasmids, and bacterial chromosomes in mixed metagenomic assemblies is critical for further unraveling their functions. Many classification tools have been developed for identifying either phages or plasmids in metagenomic assemblies. However, only two classifiers, PPR-Meta and viralVerify, were proposed to simultaneously identify phages and plasmids in mixed metagenomic assemblies. Due to the very high fraction of chromosome contigs in the assemblies, both tools achieve high precision in the classification of chromosomes but perform poorly in classifying phages and plasmids. Short contigs in these assemblies are often wrongly classified or classified as uncertain. Here we present 3CAC, a new three-class classifier that improves the precision of phage and plasmid classifications. 3CAC starts with an initial three-class classification generated by existing classifiers and further improves the classification of short contigs and contigs with low confidence classification by using proximity in the assembly graph. Evaluation on simulated metagenomes and on real human gut microbiome samples showed that 3CAC outperformed PPR-Meta and viralVerify in both precision and recall, and increased F1-score by at least 10 percentage points.

Download Full-text

De-novo chromosome level assembly of plant genomes from long read sequence data

10.1101/2021.09.09.459704 ◽

2021 ◽

Author(s):

Priyanka Sharma ◽

Ardashir Kharabian Masouleh ◽

Bruce Topp ◽

Agnelo Furtado ◽

Robert J. Henry

Keyword(s):

De Novo ◽

Sequence Data ◽

Genetic Maps ◽

Sequence Contigs ◽

Plant Genomes ◽

Proximity Analysis ◽

Long Reads ◽

A Genome ◽

Long Read ◽

Chromosome Level

SummaryRecent advances in the sequencing and assembly of plant genomes have allowed the generation of genomes with increasing contiguity and sequence accuracy. The chromosome level assembly of the contigs generated from long read sequencing has involved the use of proximity analysis (Hi-C) or traditional genetic maps to guide the placement of sequence contigs within chromosomes. The development of highly accurate long reads by repeated sequencing of circularized DNA (PacBio HiFi) has greatly increased the size of contigs. We now report the use of HiFiasm to assemble the genome of Macadamia jansenii. a genome that has been used as model to test sequencing and assembly. This achieved almost complete chromosome level assembly from the sequence data alone without the need for higher level chromosome map information. Eight of the 14 chromosomes were represented by a single large contig and the other 6 assembled into 2-4 main contigs. The small number of chromosome breaks appear to be due to highly repetitive regions of ribosomal genes that cannot be assembled by these approaches. De novo assembly of near complete chromosome level plant genomes now seems possible using these sequencing and assembly tools. Further targeted strategies might allow these remaining gaps to be closed.Significance statement (of up to two sentences)De novo assembly of near complete chromosome level plant genomes is now possible using current long read sequencing and assembly tools.

Download Full-text

First report of apple rubbery wood virus 1 in apple in China

Plant Disease ◽

10.1094/pdis-01-21-0175-pdn ◽

2021 ◽

Author(s):

Guojun Hu ◽

Yafeng Dong ◽

Zunping Zhang ◽

Xudong Fan ◽

Fang Ren ◽

...

Keyword(s):

High Throughput Sequencing ◽

De Novo ◽

Rna Virus ◽

Rt Pcr ◽

Apple Rootstocks ◽

Apple Trees ◽

Illumina Hiseq ◽

First Report ◽

Sequence Contigs ◽

Apple Stem

More than 30 viral and subviral pathogens infect apple (Malus domestica, an important fruit crop in China) trees and rootstocks, posing a threat to its production. With advances in diagnostic technologies, new viruses including apple rubbery wood virus 1 (ARWV-1), apple rubbery wood virus 2 (ARWV-2), apple luteovirus 1 (ALV), and citrus virus A (CiVA) have been detected (Beatriz et al. 2018; Rott et al. 2018; Hu et al. 2021). ARWV-1 (family Phenuiviridae) is a negative-sense single-stranded RNA virus with three RNA segments (large [L], medium [M], and small [S]). It causes apple rubbery wood disease (Rott et al. 2018) and is found in apple rootstocks, causing leaf yellowing and mottle symptoms in Korea (Lim et al. 2018). To determine virus prevalence in apple trees in China, 200 apple leaf and shoot samples were collected from orchards in Hebei (n = 26), Liaoning (40), Shandong (100), Yunnan (25), and Shanxi (4), and Inner Mongolia (5) in 2020. Total RNA was extracted from the shoot phloem or leaf (Hu et al., 2015) and subjected to reverse transcription (RT)-PCR to detect apple chlorotic leaf spot virus (ACLSV), apple stem pitting virus (ASPV), apple stem grooving virus (ASGV), apple necrotic mosaic virus (ApNMV), apple scar skin viroid (ASSVd), ARWV-2, ARWV-1, ALV, and CiVA, using primers specific to respective viruses (Supplementary Table 1). The prevalence of ACLSV, ASPV, ASGV, ApNMV, ASSVd, ARWV-2, ARWV-1, ALV and CiVA was found to be 75.5%, 85.5%, 86.0%, 43.0%, 4.0%, 48.5%, 10.5%, 0% and 0%, respectively (Supplementary Table 2). Among the 21 positive samples for ARWV-1, three, five and 13 samples were from Hebei, Liaoning, and Shandong, respectively. Five ARWV-1-positive samples (cultivars Xinhongjiangjun, Xiangfu-1, Xiangfu-2 and Tianhong) showed leaf mosaic symptoms. To confirm ARWV-1 by RT-PCR, amplicons from Xiangfu-1 and Tianhong were cloned into the pMD18-T vector (Takara, Dalian, China), and three clones of each sample were sequenced. BLASTn analyses demonstrated that the sequences (accession nos. MW507810–MW507811) shared 96.9%–98.9% identity with ARWV-1 sequences (MH714536, MF062127, and MF062138) in GenBank. An lncRNA library was prepared for high-throughput sequencing (HTS) with the Illumina HiSeq platform using Xiangfu-1 RNA. A total of 71,613,294 reads were obtained. De novo assembly of the reads revealed 135 viral sequence contigs of ACLSV, ASGV, ASPV, ApNMV, ARWV-1, and ARWV-2. The sequences of contig-100_88981 (302 nt) and contig-100_25701 (834 nt) (accession nos. MW507821 and MW507820) matched those of segment S from ARWV-1, whereas the sequences of contig-100_6542 (1,660 nt) and contig-100_27 (7,364 nt) (accession nos. MW507819 and MW507818) matched those of segments M and L, respectively. To confirm the HTS results, fragments of segments L (744 bp), M (747 bp), and S (554 bp) from Xiangfu-1 and Tianhong were amplified (Supplementary Table 1) and sequenced. The sequences (accession nos. MW507812–MW507817) showed 94.8%–99.9% nucleotide identity with the corresponding segments of ARWV-1. Co-infection of ARWV-1 with ApNMV and/or ARWV-2 was confirmed in 17/21 ARWV-1-positive samples. The prevalence of ARWV-1/ApNMV, ARWV-1/ARWV-2, and ARWV-1/ApNMV/ARWV-2 infections was 61.9%, 71.4%, and 52.4%, respectively. To our knowledge, this is the first report of ARWV-1 infecting apple trees in China. Further research is needed to determine whether and how ARWV-1 affects apple yield and quality.

Download Full-text

Discovery and genome characterization of a new Nepovirus infecting grapevine

Plant Disease ◽

10.1094/pdis-08-20-1831-re ◽

2020 ◽

Author(s):

Maher Al Rwahnih ◽

Olufemi Joseph Alabi ◽

Min Sook Hwang ◽

Tongyan Tian ◽

Dimitre Mollov ◽

...

Keyword(s):

High Throughput Sequencing ◽

Phylogenetic Analyses ◽

Virus Transmission ◽

Amino Acid Sequences ◽

Molecular Characteristics ◽

Vitis Vinifera L ◽

Sequencing Analysis ◽

Biological Index ◽

Chenopodium Amaranticolor ◽

Sequence Contigs

In 2012, dormant canes of a proprietary wine grape (Vitis vinifera L.) accession were included in the collection of the University of California-Davis Foundation Plant Services. No virus-like symptoms were elicited when bud chips from propagated own-rooted canes of the accession were graft-inoculated onto a panel of biological index grape varieties. However, chlorotic ring symptoms were observed on sap inoculated Chenopodium amaranticolor Coste & A. Rein and C. quinoa Willd. plants, indicating the presence of a mechanically transmissible virus. Transmission electron microscopy of virus preps from symptomatic C. quinoa revealed spherical, non-enveloped virions of ~27 nm in diameter. And nepovirus-like haplotypes of sequence contigs were detected in both the source grape accession and recipient C. quinoa plants using high throughput sequencing analysis. A novel bipartite nepovirus-like genome was assembled from these contigs and the termini of each RNA segment were verified by RACE assays. The RNA1 (7,186-nt) of the virus encode a large polyprotein P1 of 231.1 kDa while the RNA2 (4,460-nt) also encode a large polyprotein P2 of 148.9 kDa. Each of the polyadenylated RNA segment is flanked by 5′- (RNA1=156-nt; RNA2=170-nt) and 3′- (RNA1=834-nt; RNA2=261-nt) untranslated region sequences that shared >90% identities between their corresponding sequences. Maximum-likelihood phylogenetic analyses of the conserved Pro-Pol amino acid sequences of Secoviridae species revealed the clustering of the new virus within the nepovirus clade. Considering its biological and molecular characteristics, and based on current criteria, we propose that the novel virus, named as grapevine nepovirus A (GNVA), be assigned as a member of the genus Nepovirus.

Download Full-text

The Challenge of Genome Sequence Assembly

The Open Bioinformatics Journal ◽

10.2174/1875036201811010231 ◽

2018 ◽

Vol 11 (1) ◽

pp. 231-239 ◽

Cited By ~ 1

Author(s):

Andrew Collins

Keyword(s):

Linkage Disequilibrium ◽

Large Scale ◽

Sequence Data ◽

Sequence Assembly ◽

Error Rates ◽

Chromatin Interaction ◽

Sequence Contigs ◽

Level Sequence ◽

Diverse Species ◽

Chromosome Level

Background: Although whole genome sequencing is enabling numerous advances in many fields achieving complete chromosome-level sequence assemblies for diverse species presents difficulties. The problems in part reflect the limitations of current sequencing technologies. Chromosome assembly from ‘short read’ sequence data is confounded by the presence of repetitive genome regions with numerous similar sequence tracts which cannot be accurately positioned in the assembled sequence. Longer sequence reads often have higher error rates and may still be too short to span the larger gaps between contigs. Objective: Given the emergence of exciting new applications using sequencing technology, such as the Earth BioGenome Project, it is necessary to further develop and apply a range of strategies to achieve robust chromosome-level sequence assembly. Reviewed here are a range of methods to enhance assembly which include the use of cross-species synteny to understand relationships between sequence contigs, the development of independent genetic and/or physical scaffold maps as frameworks for assembly (for example, radiation hybrid, optical motif and chromatin interaction maps) and the use of patterns of linkage disequilibrium to help position, orient and locate contigs. Results and Conclusion: A range of methods exist which might be further developed to facilitate cost-effective large-scale sequence assembly for diverse species. A combination of strategies is required to best assemble sequence data into chromosome-level assemblies. There are a number of routes towards the development of maps which span chromosomes (including physical, genetic and linkage disequilibrium maps) and construction of these whole chromosome maps greatly facilitates the ordering and orientation of sequence contigs.

Download Full-text

Appearance of synthetic vector-associated antibiotic resistance genes in next-generation sequences

10.1101/392225 ◽

2018 ◽

Author(s):

George Taiaroa ◽

Gregory M. Cook ◽

Deborah A Williamson

Keyword(s):

Antibiotic Resistance ◽

Resistance Genes ◽

Sequence Data ◽

Antibiotic Resistance Genes ◽

Data Sets ◽

Next Generation ◽

Beta Lactam ◽

Sequence Contigs ◽

Beta Lactam Antibiotics ◽

Genome Shotgun Sequence

SynopsisBackgroundNext-generation sequencing methods have broad application in addressing increasing antibiotic resistance, with identification of antibiotic resistance genes (ARGs) having direct clinical relevance.ObjectivesHere, we describe the appearance of synthetic vector-associated ARGs in major public next-generation sequence data sets and assemblies, including in environmental samples and high priority pathogenic microorganisms.MethodsA search of selected databases – the National Centre for Biotechnology Information (NCBI) nucleotide collection, NCBI whole genome shotgun sequence contigs and literature-associated European Nucleotide Archive (ENA) datasets, was carried out using sequences characteristic of pUC-family synthetic vectors as a query in BLASTn. Identified hits were confirmed as being of synthetic origin, and further explored through alignment and comparison to primary read sets.ResultsSynthetic vectors are attributed to a range of organisms in each of the NCBI databases searched, including examples belonging to each Kingdom of life. These synthetic vectors are associated with various ARGs, primarily those encoding resistance to beta-lactam antibiotics and aminoglycosides. Synthetic vector associated ARGs are also observed in multiple environmental meta-transcriptome datasets, as shown through analysis of associated ENA primary reads, and are proposed to have led to incorrect statements being made in the literature on the abundance of ARGs.ConclusionsAppearance of synthetic vector-associated ARGs can confound the study of antimicrobial resistance in varied settings, and may have clinical implications in the nearfuture.

Download Full-text

Exploring the unmapped DNA and RNA reads in a songbird genome

10.1101/371963 ◽

2018 ◽

Cited By ~ 1

Author(s):

Veronika N. Laine ◽

Toni I. Gossmann ◽

Kees van Oers ◽

Marcel E. Visser ◽

Martien A.M. Groenen

Keyword(s):

Reference Genome ◽

De Novo ◽

Sequence Similarity ◽

Bird Species ◽

Parus Major ◽

Great Tit ◽

Biological Information ◽

Blood Parasites ◽

Sequence Contigs ◽

Unmapped Reads

AbstractBackgroundA widely used approach in next-generation sequencing projects is the alignment of reads to a reference genome. A significant percentage of reads, however, frequently remain unmapped despite improvements in the methods and hardware, which have enhanced the efficiency and accuracy of alignments. Usually unmapped reads are discarded from the analysis process, but significant biological information and insights can be uncovered from this data. We explored the unmapped DNA (normal and bisulfite treated) and RNA sequence reads of the great tit (Parus major) reference genome individual. From the unmapped reads we generated de novo assemblies. The generated sequence contigs were then aligned to the NCBI non-redundant nucleotide database using BLAST, identifying the closest known matching sequence.ResultsMany of the aligned contigs showed sequence similarity to sequences from different bird species and genes that were absent in the great tit reference assembly. Furthermore, there were also contigs that represented known P. major pathogenic species. Most interesting were several species of blood parasites such as Plasmodium and Trypanosoma.ConclusionsOur analyses revealed that meaningful biological information can be found when further exploring unmapped reads. It is possible to discover sequences that are either absent or misassembled in the reference genome and sequences that indicate infection or sample contamination. In this study we also propose strategies to aid the capture and interpretation of this information from unmapped reads.

Download Full-text

SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution

10.1101/247536 ◽

2018 ◽

Author(s):

Li Charlie Xia ◽

Dongmei Ai ◽

Hojoon Lee ◽

Noemi Andor ◽

Chao Li ◽

...

Keyword(s):

Sequence Data ◽

Clonal Evolution ◽

Ground Truth ◽

Next Generation Sequencing Data ◽

Structural Variants ◽

Sequencing Data ◽

Structural Variations ◽

Insert Size ◽

Sequence Contigs ◽

Allelic Fraction

ABSTRACTBackgroundSimulating genome sequence data with features can facilitate the development and benchmarking of structural variant analysis programs. However, there are a limited number of data simulators that provide structural variants in silico. Moreover, there are a paucity of programs that generate structural variants with different allelic fraction and haplotypes.FindingsWe developed SVEngine, an open source tool to address this need. SVEngine simulates next generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine’s flexible design process enables one to specify size, position, and allelic fraction for deletion, insertion, duplication, inversion and translocation variants. Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time.ConclusionsWe demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine’s features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated the accuracy of the simulations. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types. SVEngine is implemented as a standard Python package and is freely available for academic use at: https://bitbucket.org/charade/svengine.

Download Full-text

Draft Genome Sequence of Bacillus licheniformis Strain YNP1-TSU Isolated from Whiterock Springs in Yellowstone National Park

Genome Announcements ◽

10.1128/genomea.01496-16 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 5

Author(s):

Joshua A. O'Hair ◽

Hui Li ◽

Santosh Thapa ◽

Matthew B. Scholz ◽

Suping Zhou

Keyword(s):

Genome Sequence ◽

Bacillus Licheniformis ◽

Yellowstone National Park ◽

National Park ◽

Draft Genome ◽

Biofuel Production ◽

Draft Genome Sequence ◽

Content Type ◽

Sequence Contigs ◽

Automated Annotation

ABSTRACT Novel cellulolytic microorganisms can potentially influence second-generation biofuel production. This paper reports the draft genome sequence of Bacillus licheniformis strain YNP1-TSU, isolated from hydrothermal-vegetative microbiomes inside Yellowstone National Park. The assembled sequence contigs predicted 4,230 coding genes, 66 tRNAs, and 10 rRNAs through automated annotation.

Download Full-text

sequence contigs
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

NPGREAT: Assembly of the human subtelomere regions with the use of ultralong Nanopore reads and Linked-Reads

3CAC: improving the classification of phages and plasmids from metagenomic assemblies using assembly graphs

De-novo chromosome level assembly of plant genomes from long read sequence data

First report of apple rubbery wood virus 1 in apple in China

Discovery and genome characterization of a new Nepovirus infecting grapevine

The Challenge of Genome Sequence Assembly

Appearance of synthetic vector-associated antibiotic resistance genes in next-generation sequences

Exploring the unmapped DNA and RNA reads in a songbird genome

SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution

Draft Genome Sequence of Bacillus licheniformis Strain YNP1-TSU Isolated from Whiterock Springs in Yellowstone National Park

Export Citation Format

sequence contigsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

NPGREAT: Assembly of the human subtelomere regions with the use of ultralong Nanopore reads and Linked-Reads

3CAC: improving the classification of phages and plasmids from metagenomic assemblies using assembly graphs

De-novo chromosome level assembly of plant genomes from long read sequence data

First report of apple rubbery wood virus 1 in apple in China

Discovery and genome characterization of a new Nepovirus infecting grapevine

The Challenge of Genome Sequence Assembly

Appearance of synthetic vector-associated antibiotic resistance genes in next-generation sequences

Exploring the unmapped DNA and RNA reads in a songbird genome

SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution

Draft Genome Sequence of Bacillus licheniformis Strain YNP1-TSU Isolated from Whiterock Springs in Yellowstone National Park

sequence contigs
Recently Published Documents