Access COI barcode efficiently using high throughput Single-End 400 bp sequencing

Mapping Intimacies ◽

10.1101/498618 ◽

2018 ◽

Cited By ~ 1

Author(s):

Chentao Yang ◽

Shangjin Tan ◽

Guangliang Meng ◽

David G. Bourne ◽

Paul A. O’Brien ◽

...

Keyword(s):

Dna Barcoding ◽

High Throughput ◽

High Throughput Sequencing ◽

Rapid Development ◽

Full Length ◽

List Type ◽

Sequencing Data ◽

Test Plate ◽

Function Modules ◽

Sequencing Platforms

SummaryOver the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, constraints in barcoding costs led to unbalanced efforts which prevented accurate taxonomic identification for biodiversity studies.We present a high throughput sequencing approach based on the HIFI-SE pipeline which takes advantage of Single-End 400 bp (SE400) sequencing data generated by BGISEQ-500 to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons. HIFI-SE was written in Python and included four function modules of filter, assign, assembly and taxonomy.We applied the HIFI-SE to a test plate which contained 96 samples (30 corals, 64 insects and 2 blank controls) and delivered a total of 86 fully assembled HIFI COI barcodes. By comparing to their corresponding Sanger sequences (72 sequences available), it showed that most of the samples (98.61%, 71/72) were correctly and accurately assembled, including 46 samples that had a similarity of 100% and 25 of ca. 99%.Our approach can produce standard full-length barcodes cost efficiently, allowing DNA barcoding for global biomes which will advance DNA-based species identification for various ecosystems and improve quarantine biosecurity efforts.

Download Full-text

Efficient COI barcoding using high throughput single-end 400 bp sequencing

BMC Genomics ◽

10.1186/s12864-020-07255-w ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Chentao Yang ◽

Yuxuan Zheng ◽

Shangjin Tan ◽

Guanliang Meng ◽

Wei Rao ◽

...

Keyword(s):

Dna Barcoding ◽

High Throughput ◽

High Throughput Sequencing ◽

Marine Invertebrates ◽

Rapid Development ◽

High Sensitivity ◽

Similarity Score ◽

Full Length ◽

Read Length ◽

Sequencing Platforms

Abstract Background Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina’s MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio’s SEQUEL II system). Results Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5′ and 3′ ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%. Conclusions The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.

Download Full-text

HTSeq - A Python framework to work with high-throughput sequencing data

10.1101/002824 ◽

2014 ◽

Cited By ~ 242

Author(s):

Simon Anders ◽

Paul Theodor Pyl ◽

Wolfgang Huber

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Rapid Development ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Standard Work ◽

Data Formats ◽

High Throughput Sequencing Data ◽

Python Package

Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeq

Download Full-text

WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkq431 ◽

2010 ◽

Vol 38 (suppl_2) ◽

pp. W378-W384 ◽

Cited By ~ 5

Author(s):

Andreas Massouras ◽

Frederik Decouttere ◽

Korneel Hens ◽

Bart Deplancke

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Full Length ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequence Identification ◽

Clone Sequence ◽

Full Length Clone

Download Full-text

Integrative analyses of transcriptome data reveal the mechanisms of post-transcriptional regulation

Briefings in Functional Genomics ◽

10.1093/bfgp/elab004 ◽

2021 ◽

Author(s):

Jinkai Wang

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Rna Binding ◽

Rna Binding Proteins ◽

Rapid Development ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Public Resources ◽

Integrative Analyses ◽

Post Transcriptional Regulation

Abstract Post-transcriptional processing of RNAs plays important roles in a variety of physiological and pathological processes. These processes can be precisely controlled by a series of RNA binding proteins and cotranscriptionally regulated by transcription factors as well as histone modifications. With the rapid development of high-throughput sequencing techniques, multiomics data have been broadly used to study the mechanisms underlying the important biological processes. However, how to use these high-throughput sequencing data to elucidate the fundamental regulatory roles of post-transcriptional processes is still of great challenge. This review summarizes the regulatory mechanisms of post-transcriptional processes and the general principles and approaches to dissect these mechanisms by integrating multiomics data as well as public resources.

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

Bioinformatics ◽

10.1093/bioinformatics/btu010 ◽

2014 ◽

Vol 30 (9) ◽

pp. 1214-1219 ◽

Cited By ~ 6

Author(s):

C. Ye ◽

C. Hsiao ◽

H. Corrada Bravo

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Blind Deconvolution ◽

Sequencing Data ◽

Base Calling ◽

High Throughput Sequencing Data

Download Full-text

Rapid identification and metagenomics analysis of the adenovirus type 55 outbreak in Hubei using real-time and high-throughput sequencing platforms

Infection Genetics and Evolution ◽

10.1016/j.meegid.2021.104939 ◽

2021 ◽

pp. 104939

Author(s):

Peihan Li ◽

Kaiying Wang ◽

Shaofu Qiu ◽

Yanfeng Lin ◽

Jing Xie ◽

...

Keyword(s):

Real Time ◽

High Throughput ◽

High Throughput Sequencing ◽

Adenovirus Type ◽

Rapid Identification ◽

Sequencing Platforms ◽

Metagenomics Analysis

Download Full-text

Single‐cell based high‐throughput sequencing of full‐length immunoglobulin heavy and light chain genes

European Journal of Immunology ◽

10.1002/eji.201343917 ◽

2013 ◽

Vol 44 (2) ◽

pp. 597-603 ◽

Cited By ~ 83

Author(s):

Christian E. Busse ◽

Irina Czogiel ◽

Peter Braun ◽

Peter F. Arndt ◽

Hedda Wardemann

Keyword(s):

Single Cell ◽

High Throughput ◽

Light Chain ◽

High Throughput Sequencing ◽

Full Length

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

Genomics ◽

10.1016/j.ygeno.2017.01.005 ◽

2017 ◽

Vol 109 (2) ◽

pp. 83-90 ◽

Cited By ~ 44

Author(s):

Yan Guo ◽

Yulin Dai ◽

Hui Yu ◽

Shilin Zhao ◽

David C. Samuels ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text