Overview of organohalide-respiring bacteria and a proposal for a classification system for reductive dehalogenases

Laura A. Hug; Farai Maphosa; David Leys; Frank E. Löffler; Hauke Smidt; Elizabeth A. Edwards; Lorenz Adrian

doi:10.1098/rstb.2012.0322

IsoDetect: Detection of splice isoforms from third generation long reads based on short feature sequences

Current Bioinformatics ◽

10.2174/1574893615666200316101205 ◽

2020 ◽

Vol 15 ◽

Author(s):

Hongdong Li ◽

Wenjing Zhang ◽

Yuwen Luo ◽

Jianxin Wang

Keyword(s):

Sequence Similarity ◽

Detection Methods ◽

Sequence Information ◽

Third Generation ◽

Sequencing Data ◽

Splice Isoforms ◽

Third Generation Sequencing ◽

Long Reads ◽

Feature Sequence ◽

Generation Sequencing

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.

Download Full-text

Characterization of the transcriptome and EST-SSR development in Boea clarkeana, a desiccation-tolerant plant endemic to China

10.7287/peerj.preprints.2603v1 ◽

2016 ◽

Author(s):

Ying Wang ◽

Kun Liu ◽

De Bi ◽

Biao Shou Zhou ◽

Wen Jian Shao

Keyword(s):

De Novo ◽

Gene Annotation ◽

Sequence Similarity ◽

Molecular Study ◽

Sequence Information ◽

Sequencing Data ◽

Protein Database ◽

Illumina Hiseq ◽

Significant Similarity ◽

Assembly Technology

Background. Resurrection plants constitute a unique cadre within angiosperms. Boea clarkeana Hemsl. (Boea, Gesneriaceae) is a desiccation-tolerant dicotyledonous herb that is endemic to China. Although research on angiosperms with DT could be instructive for crops, genomic resources for B. clarkeana remain scarce. In addition, transcriptome sequencing could be an effective way to study desiccation-tolerant plants. Methods. In the present study, we used the platform Illumina HiSeqTM 2000 and de novo assembly technology to obtain leaf transcriptomes of B. clarkeana and conducted a BLASTX alignment of the sequencing data and protein databases for sequence classification and annotation. Then, based on the sequence information obtained, we developed EST-SSR markers by means of EST-SSR mining, primer design and polymorphism identification. Results. A total of 91,449 unigenes were generated from the leaf cDNA library of B. clarkeana in this study. Based on a sequence similarity search with a known protein database, 72,087 unigenes were annotated. Among the annotated unigenes, a total of 71,170 unigenes showed significant similarity to known proteins of 463 popular model species in the Nr database, and 59,962 unigenes and 32,336 unigenes were assigned to GO classifications and COG, respectively. In addition, 44,924 unigenes were mapped in 128 KEGG pathways. Furthermore, a total of 7,610 unigenes with 8,563 microsatellites were found. Seventy-four primer pairs were selected from 436 primer pairs designed for polymorphism validation. SSRs with higher polymorphism rates were concentrated on dinucleotides, pentanucleotides and hexanucleotides. Finally, 17 pairs with highly polymorphic and stable loci were selected for polymorphism screening. There were a total of 65 alleles, with 2–6 alleles at each locus. Mainly due to the unique biological characteristics of plants, the HE, HO and PIC per locus were very low, ranging from 0 to 0.196, 0.082 to 0.14 and 0 to 0.155, respectively. Discussion. A substantial fraction transcriptome sequences of B. clarkeana were generated in this study, which is the first molecular-level analysis of this plant. These sequences are valuable resources for gene annotation and discovery and molecular marker development. These sequences could also provide a valuable basis for the future molecular study of B. clarkeana.

Download Full-text

Multiple Nonidentical Reductive-Dehalogenase-Homologous Genes Are Common in Dehalococcoides

Applied and Environmental Microbiology ◽

10.1128/aem.70.9.5290-5297.2004 ◽

2004 ◽

Vol 70 (9) ◽

pp. 5290-5297 ◽

Cited By ~ 98

Author(s):

Tina Hölscher ◽

Rosa Krajmalnik-Brown ◽

Kirsti M. Ritalahti ◽

Friedrich von Wintzingerode ◽

Helmut Görisch ◽

...

Keyword(s):

Amino Acid ◽

Restriction Analysis ◽

Sequence Similarity ◽

Comparative Sequence Analysis ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Degenerate Primers ◽

Reductive Dehalogenase ◽

Genes Encoding ◽

Reductive Dehalogenase Genes

ABSTRACT Degenerate primers were used to amplify large fragments of reductive-dehalogenase-homologous (RDH) genes from genomic DNA of two Dehalococcoides populations, the chlorobenzene- and dioxin-dechlorinating strain CBDB1 and the trichloroethene-dechlorinating strain FL2. The amplicons (1,350 to 1,495 bp) corresponded to nearly complete open reading frames of known reductive dehalogenase genes and short fragments (approximately 90 bp) of genes encoding putative membrane-anchoring proteins. Cloning and restriction analysis revealed the presence of at least 14 different RDH genes in each strain. All amplified RDH genes showed sequence similarity with known reductive dehalogenase genes over the whole length of the sequence and shared all characteristics described for reductive dehalogenases. Deduced amino acid sequences of seven RDH genes from strain CBDB1 were 98.5 to 100% identical to seven different RDH genes from strain FL2, suggesting that both strains have an overlapping substrate range. All RDH genes identified in strains CBDB1 and FL2 were related to the RDH genes present in the genomes of Dehalococcoides ethenogenes strain 195 and Dehalococcoides sp. strain BAV1; however, sequence identity did not exceed 94.4 and 93.1%, respectively. The presence of RDH genes in strains CBDB1, FL2, and BAV1 that have no orthologs in strain 195 suggests that these strains possess dechlorination activities not present in strain 195. Comparative sequence analysis identified consensus sequences for cobalamin binding in deduced amino acid sequences of seven RDH genes. In conclusion, this study demonstrates that the presence of multiple nonidentical RDH genes is characteristic of Dehalococcoides strains.

Download Full-text

Characterization of the transcriptome and EST-SSR development in Boea clarkeana, a desiccation-tolerant plant endemic to China

10.7287/peerj.preprints.2603 ◽

2016 ◽

Author(s):

Ying Wang ◽

Kun Liu ◽

De Bi ◽

Biao Shou Zhou ◽

Wen Jian Shao

Keyword(s):

De Novo ◽

Gene Annotation ◽

Sequence Similarity ◽

Molecular Study ◽

Sequence Information ◽

Sequencing Data ◽

Protein Database ◽

Illumina Hiseq ◽

Significant Similarity ◽

Assembly Technology

Background. Resurrection plants constitute a unique cadre within angiosperms. Boea clarkeana Hemsl. (Boea, Gesneriaceae) is a desiccation-tolerant dicotyledonous herb that is endemic to China. Although research on angiosperms with DT could be instructive for crops, genomic resources for B. clarkeana remain scarce. In addition, transcriptome sequencing could be an effective way to study desiccation-tolerant plants. Methods. In the present study, we used the platform Illumina HiSeqTM 2000 and de novo assembly technology to obtain leaf transcriptomes of B. clarkeana and conducted a BLASTX alignment of the sequencing data and protein databases for sequence classification and annotation. Then, based on the sequence information obtained, we developed EST-SSR markers by means of EST-SSR mining, primer design and polymorphism identification. Results. A total of 91,449 unigenes were generated from the leaf cDNA library of B. clarkeana in this study. Based on a sequence similarity search with a known protein database, 72,087 unigenes were annotated. Among the annotated unigenes, a total of 71,170 unigenes showed significant similarity to known proteins of 463 popular model species in the Nr database, and 59,962 unigenes and 32,336 unigenes were assigned to GO classifications and COG, respectively. In addition, 44,924 unigenes were mapped in 128 KEGG pathways. Furthermore, a total of 7,610 unigenes with 8,563 microsatellites were found. Seventy-four primer pairs were selected from 436 primer pairs designed for polymorphism validation. SSRs with higher polymorphism rates were concentrated on dinucleotides, pentanucleotides and hexanucleotides. Finally, 17 pairs with highly polymorphic and stable loci were selected for polymorphism screening. There were a total of 65 alleles, with 2–6 alleles at each locus. Mainly due to the unique biological characteristics of plants, the HE, HO and PIC per locus were very low, ranging from 0 to 0.196, 0.082 to 0.14 and 0 to 0.155, respectively. Discussion. A substantial fraction transcriptome sequences of B. clarkeana were generated in this study, which is the first molecular-level analysis of this plant. These sequences are valuable resources for gene annotation and discovery and molecular marker development. These sequences could also provide a valuable basis for the future molecular study of B. clarkeana.

Download Full-text

Characterization of the transcriptome and EST-SSR development in Boea clarkeana, a desiccation-tolerant plant endemic to China

PeerJ ◽

10.7717/peerj.3422 ◽

2017 ◽

Vol 5 ◽

pp. e3422 ◽

Cited By ~ 4

Author(s):

Ying Wang ◽

Kun Liu ◽

De Bi ◽

Shoubiao Zhou ◽

Jianwen Shao

Keyword(s):

Functional Annotation ◽

De Novo ◽

Gene Annotation ◽

Sequence Similarity ◽

Sequence Information ◽

Sequencing Data ◽

Protein Database ◽

Illumina Hiseq ◽

Significant Similarity ◽

Assembly Technology

Background Desiccation-tolerant (DT) plants can recover full metabolic competence upon rehydration after losing most of their cellular water (>95%) for extended periods of time. Functional genomic approaches such as transcriptome sequencing can help us understand how DT plants survive and respond to dehydration, which has great significance for plant biology and improving the drought tolerance of crops. Boea clarkeana Hemsl. (Gesneriaceae) is a DT dicotyledonous herb. Its genomic sequences characteristics remain unknown. Based on transcriptomic analyses, polymorphic EST-SSR (simple sequence repeats in expressed sequence tags) molecular primers can be designed, which will greatly facilitate further investigations of the population genetics and demographic histories of DT plants. Methods In the present study, we used the platform Illumina HiSeq™2000 and de novo assembly technology to obtain leaf transcriptomes of B. clarkeana and conducted a BLASTX alignment of the sequencing data and protein databases for sequence classification and annotation. Then, based on the sequence information, the EST-SSR markers were developed, and the functional annotation of ESTs containing polymorphic SSRs were obtained through BLASTX. Results A total of 91,449 unigenes were generated from the leaf cDNA library of B. clarkeana. Based on a sequence similarity search with a known protein database, 72,087 unigenes were annotated. Among the annotated unigenes, a total of 71,170 unigenes showed significant similarity to the known proteins of 463 popular model species in the Nr database, and 59,962 unigenes and 32,336 unigenes were assigned to Gene Ontology (GO) classifications and Cluster of Orthologous Groups (COG), respectively. In addition, 44,924 unigenes were mapped in 128 KEGG pathways. Furthermore, a total of 7,610 unigenes with 8,563 microsatellites were found. Seventy-four primer pairs were selected from 436 primer pairs designed for polymorphism validation. SSRs with higher polymorphism rates were concentrated on dinucleotides, pentanucleotides and hexanucleotides. Finally, 17 pairs with stable, highly polymorphic loci were selected for polymorphism screening. There was a total of 65 alleles, with 2–6 alleles at each locus. Primarily due to the unique biological characteristics of plants, the HE (0–0.196), HO (0.082–0.14) and PIC (0–0.155) per locus were very low. The functional annotation distribution centered on ESTs containing di- and tri-nucleotide SSRs, and the ESTs containing primers BC2, BC4 and BC12 were annotated to vegetative dehydration/desiccation pathways. Discussion This work is the first genetic study of B. clarkeana as a new plant resource of DT genes. A substantial number of transcriptome sequences were generated in this study. These sequences are valuable resources for gene annotation and discovery as well as molecular marker development. These sequences could also provide a valuable basis for future molecular studies of B. clarkeana.

Download Full-text

Data-driven biological network alignment that uses topological, sequence, and functional information

BMC Bioinformatics ◽

10.1186/s12859-021-03971-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shawn Gu ◽

Tijana Milenković

Keyword(s):

Prediction Accuracy ◽

Biological Network ◽

Network Alignment ◽

Data Driven ◽

Sequence Information ◽

Functional Prediction ◽

Research Knowledge ◽

Topological Information ◽

Functional Knowledge ◽

Related Proteins

Abstract Background Network alignment (NA) can transfer functional knowledge between species’ conserved biological network regions. Traditional NA assumes that it is topological similarity (isomorphic-like matching) between network regions that corresponds to the regions’ functional relatedness. However, we recently found that functionally unrelated proteins are as topologically similar as functionally related proteins. So, we redefined NA as a data-driven method called TARA, which learns from network and protein functional data what kind of topological relatedness (rather than similarity) between proteins corresponds to their functional relatedness. TARA used topological information (within each network) but not sequence information (between proteins across networks). Yet, TARA yielded higher protein functional prediction accuracy than existing NA methods, even those that used both topological and sequence information. Results Here, we propose TARA++ that is also data-driven, like TARA and unlike other existing methods, but that uses across-network sequence information on top of within-network topological information, unlike TARA. To deal with the within-and-across-network analysis, we adapt social network embedding to the problem of biological NA. TARA++ outperforms protein functional prediction accuracy of existing methods. Conclusions As such, combining research knowledge from different domains is promising. Overall, improvements in protein functional prediction have biomedical implications, for example allowing researchers to better understand how cancer progresses or how humans age.

Download Full-text

The Complete Chloroplast Genome of the Vulnerable Oreocharis esquirolii (Gesneriaceae): Structural Features, Comparative and Phylogenetic Analysis

Plants ◽

10.3390/plants9121692 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1692

Author(s):

Li Gu ◽

Ting Su ◽

Ming-Tai An ◽

Guo-Xiong Hu

Keyword(s):

Phylogenetic Analysis ◽

Sequence Similarity ◽

Single Copy ◽

Structural Features ◽

Rrna Genes ◽

Trna Genes ◽

Sequencing Data ◽

High Sequence Similarity ◽

Plastid Genomes ◽

Cp Genome

Oreocharis esquirolii, a member of Gesneriaceae, is known as Thamnocharis esquirolii, which has been regarded a synonym of the former. The species is endemic to Guizhou, southwestern China, and is evaluated as vulnerable (VU) under the International Union for Conservation of Nature (IUCN) criteria. Until now, the sequence and genome information of O. esquirolii remains unknown. In this study, we assembled and characterized the complete chloroplast (cp) genome of O. esquirolii using Illumina sequencing data for the first time. The total length of the cp genome was 154,069 bp with a typical quadripartite structure consisting of a pair of inverted repeats (IRs) of 25,392 bp separated by a large single copy region (LSC) of 85,156 bp and a small single copy region (SSC) of18,129 bp. The genome comprised 114 unique genes with 80 protein-coding genes, 30 tRNA genes, and four rRNA genes. Thirty-one repeat sequences and 74 simple sequence repeats (SSRs) were identified. Genome alignment across five plastid genomes of Gesneriaceae indicated a high sequence similarity. Four highly variable sites (rps16-trnQ, trnS-trnG, ndhF-rpl32, and ycf 1) were identified. Phylogenetic analysis indicated that O. esquirolii grouped together with O. mileensis, supporting resurrection of the name Oreocharis esquirolii from Thamnocharisesquirolii. The complete cp genome sequence will contribute to further studies in molecular identification, genetic diversity, and phylogeny.

Download Full-text

Selection of Tree Nut Allergen Peptide Markers: A Need for Improved Protein Sequence Databases

Journal of AOAC International ◽

10.1093/jaoac/102.5.1263 ◽

2019 ◽

Vol 102 (5) ◽

pp. 1263-1270 ◽

Cited By ~ 1

Author(s):

Weili Xiong ◽

Melinda A McFarland ◽

Cary Pirone ◽

Christine H Parker

Keyword(s):

Food Allergen ◽

Protein Sequence ◽

Sequence Information ◽

Sequencing Data ◽

Reference Tree ◽

Candidate Peptide ◽

Tree Nut ◽

Allergen Detection ◽

Sequence Databases ◽

Selection Of

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.

Download Full-text

TrancriptomeReconstructoR, A Data-Driven Annotation of Complex Transcriptomes

10.21203/rs.3.rs-131404/v1 ◽

2020 ◽

Author(s):

Maxim Ivanov ◽

Albin Sandelin ◽

Sebastian Marquardt

Keyword(s):

De Novo ◽

Gene Annotation ◽

R Package ◽

Sequence Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Model ◽

Preparation Methods ◽

Downstream Analysis

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.

Download Full-text

Phylogenetic and Transcriptional Analyses of a Tetrachloroethene-Dechlorinating "Dehalococcoides" Enrichment Culture TUT2264 and Its Reductive-Dehalogenase Genes

Microbes and Environments ◽

10.1264/jsme2.me09133 ◽

2009 ◽

Vol 24 (4) ◽

pp. 330-337 ◽

Cited By ~ 27

Author(s):

Hiroyuki Futamata ◽

Shinichi Kaiya ◽

Mariko Sugawara ◽

Akira Hiraishi

Keyword(s):

Enrichment Culture ◽

Reductive Dehalogenase ◽

Reductive Dehalogenase Genes

Download Full-text