Massively scalable genetic analysis of antibody repertoires

Mapping Intimacies ◽

10.1101/447813 ◽

2018 ◽

Cited By ~ 4

Author(s):

Bryan Briney ◽

Dennis R. Burton

Keyword(s):

Genetic Analysis ◽

Sequence Data ◽

Read Length ◽

Antibody Repertoire ◽

Link Type ◽

Repertoire Sequencing ◽

Antibody Sequence ◽

Downstream Analysis ◽

Sequencing Platforms ◽

Computational Resources

AbstractWith technical breakthroughs in the throughput and read-length of next-generation sequencing platforms, antibody repertoire sequencing is becoming an increasingly important tool for detailed characterization of the immune response. There is a need for open, scalable software for the genetic analysis of repertoire-scale antibody sequence data. To address this gap, we have developed the ab[x] package of software tools. There are three core components of the ab[x] toolkit, all of which are freely available: abcloud (github.com/briney/abcloud) for deployment and management of computational resources on Amazon’s Elastic Compute Cloud; abstar (github.com/briney/abstar) for pre-processing, germline gene assignment and primary annotation of antibody sequence data; and abutils (github.com/briney/abutils), which provides utilities for interactive downstream analysis of antibody repertoire data.

Download Full-text

A workflow for accurate metabarcoding using nanopore MinION sequencing

10.1101/2020.05.21.108852 ◽

2020 ◽

Cited By ~ 2

Author(s):

Bilgenur Baloğlu ◽

Zhewei Chen ◽

Vasco Elbrecht ◽

Thomas Braukmann ◽

Shanna MacDonald ◽

...

Keyword(s):

High Throughput Sequencing ◽

Sequence Data ◽

Rolling Circle Amplification ◽

Error Rates ◽

Read Length ◽

Taxonomic Assignment ◽

Major Drawback ◽

Rolling Circle ◽

Sequencing Platform ◽

Sequencing Platforms

AbstractMetabarcoding has become a common approach to the rapid identification of the species composition in a mixed sample. The majority of studies use established short-read high-throughput sequencing platforms. The Oxford Nanopore MinION™, a portable sequencing platform, represents a low-cost alternative allowing researchers to generate sequence data in the field. However, a major drawback is the high raw read error rate that can range from 10% to 22%.To test if the MinION™ represents a viable alternative to other sequencing platforms we used rolling circle amplification (RCA) to generate full-length consensus DNA barcodes (658bp of cytochrome oxidase I - COI) for a bulk mock sample of 50 aquatic invertebrate species. By applying two different laboratory protocols, we generated two MinION™ runs that were used to build consensus sequences. We also developed a novel Python pipeline, ASHURE, for processing, consensus building, clustering, and taxonomic assignment of the resulting reads.We were able to show that it is possible to reduce error rates to a median accuracy of up to 99.3% for long RCA fragments (>45 barcodes). Our pipeline successfully identified all 50 species in the mock community and exhibited comparable sensitivity and accuracy to MiSeq. The use of RCA was integral for increasing consensus accuracy, but it was also the most time-consuming step during the laboratory workflow and most RCA reads were skewed towards a shorter read length range with a median RCA fragment length of up to 1262bp. Our study demonstrates that Nanopore sequencing can be used for metabarcoding but we recommend the exploration of other isothermal amplification procedures to improve consensus length.

Download Full-text

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.054171-0 ◽

2014 ◽

Vol 64 (Pt_2) ◽

pp. 316-324 ◽

Cited By ~ 258

Author(s):

Jongsik Chun ◽

Fred A. Rainey

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Original Research ◽

Rrna Gene ◽

New Taxon ◽

Genome Sequences ◽

Microbial World ◽

Content Type ◽

Link Type ◽

Type Strains

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Download Full-text

Sequencing and Computational Approaches to Identification and Characterization of Microbial Organisms

Biomedical Engineering and Computational Biology ◽

10.4137/becb.s10886 ◽

2013 ◽

Vol 5 ◽

pp. BECB.S10886 ◽

Cited By ~ 2

Author(s):

Brijesh Singh Yadav ◽

Venkateswarlu Ronda ◽

Dinesh P. Vashista ◽

Bhaskar Sharma

Keyword(s):

Sequence Data ◽

Microbial Interactions ◽

Microbial Pathogens ◽

Nucleotide Sequence Data ◽

Computational Approaches ◽

Microbial Detection ◽

Sequencing Technologies ◽

Sequencing Platforms ◽

Identification And Characterization

The recent advances in sequencing technologies and computational approaches are propelling scientists ever closer towards complete understanding of human-microbial interactions. The powerful sequencing platforms are rapidly producing huge amounts of nucleotide sequence data which are compiled into huge databases. This sequence data can be retrieved, assembled, and analyzed for identification of microbial pathogens and diagnosis of diseases. In this article, we present a commentary on how the metagenomics incorporated with microarray and new sequencing techniques are helping microbial detection and characterization.

Download Full-text

A comparison of unamplified and massively multiplexed PCR amplification for murine antibody repertoire sequencing

FASEB BioAdvances ◽

10.1096/fba.1017 ◽

2018 ◽

Vol 1 (1) ◽

pp. 6-17 ◽

Cited By ~ 1

Author(s):

Trisha A. Rettig ◽

Michael J. Pecaut ◽

Stephen K. Chapes

Keyword(s):

Pcr Amplification ◽

Antibody Repertoire ◽

Multiplexed Pcr ◽

Repertoire Sequencing ◽

Murine Antibody

Download Full-text

mSphere of Influence: the Power of Yeast Genetics Still Going Strong!

mSphere ◽

10.1128/msphere.00647-19 ◽

2019 ◽

Vol 4 (5) ◽

Author(s):

Felipe H. Santiago-Tirado

Keyword(s):

Genetic Analysis ◽

Fungal Pathogen ◽

Cell Biology ◽

Genetic Manipulation ◽

Chemical Genetics ◽

Yeast Genetics ◽

Content Type ◽

Human Fungal Pathogen ◽

Link Type ◽

Fungal Meningitis

ABSTRACT Felipe Santiago-Tirado studies the cell biology of cryptococcal infections. In this mSphere of Influence article, he reflects on how the papers “Systematic Genetic Analysis of Virulence in the Human Fungal Pathogen Cryptococcus neoformans” (https://doi.org/10.1016/j.cell.2008.07.046) and “Unraveling the Biology of a Fungal Meningitis Pathogen Using Chemical Genetics” (https://doi.org/10.1016/j.cell.2014.10.044) by the Noble and Madhani groups influenced his thinking by showcasing the various modern applications of yeast genetics in an organism where genetic manipulation was difficult.

Download Full-text

Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions

PeerJ Computer Science ◽

10.7717/peerj-cs.90 ◽

2016 ◽

Vol 2 ◽

pp. e90 ◽

Cited By ~ 24

Author(s):

Ranko Gacesa ◽

David J. Barlow ◽

Paul F. Long

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Biological Data ◽

Biological Databases ◽

Web Based ◽

Physiological Functions ◽

Link Type ◽

Venom Toxins ◽

Venomous Animals ◽

Toxin Protein

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).

Download Full-text

MHC-Linked Olfactory Receptor Loci Exhibit Polymorphism and Contribute to Extended HLA/OR-Haplotypes

Genome Research ◽

10.1101/gr.120400 ◽

2000 ◽

Vol 10 (12) ◽

pp. 1968-1978 ◽

Cited By ~ 1

Author(s):

Anke Ehlers ◽

Stephan Beck ◽

Simon A. Forbes ◽

John Trowsdale ◽

Armin Volz ◽

...

Keyword(s):

Olfactory Receptor ◽

Sequence Data ◽

Allelic Variation ◽

Mate Preferences ◽

Coding Region ◽

Link Type ◽

Or Gene ◽

Hla Haplotypes ◽

A Minor ◽

Or Genes

Clusters of olfactory receptor (OR) genes are found on most human chromosomes. They are one of the largest mammalian multigene families. Here, we report a systematic study of polymorphism of OR genes belonging to the largest fully sequenced OR cluster. The cluster contains 36 OR genes, of which two belong to the vomeronasal 1 (V1-OR) family. The cluster is divided into a major and a minor region at the telomeric end of the HLA complex on chromosome 6. These OR genes could be involved in MHC-related mate preferences. The polymorphism screen was carried out with 13 genes from the HLA-linked OR cluster and three genes from chromosomes 7, 17, and 19 as controls. Ten human cell lines, representing 18 different chromosome 6s, were analyzed. They were from various ethnic origins and exhibited different HLA haplotypes. All OR genes tested, including those not linked to the HLA complex, were polymorphic. These polymorphisms were dispersed along the coding region and resulted in up to seven alleles for a given OR gene. Three polymorphisms resulted either in stop codons (genes hs6M1-4P,hs6M1-17) or in a 16–bp deletion (gene hs6M1-19P), possibly leading to lack of ligand recognition by the respective receptors in the cell line donors. In total, 13 HLA-linked OR haplotypes could be defined. Therefore, allelic variation appears to be a general feature of human OR genes.[The sequence data reported in this paper have been submitted to EMBL under accession nos. AC006137, AC004178, AJ132194, AL022727, AL031983,AL035402, AL035542, Z98744, CAB55431, AL050339, AL035402, AL096770,AL133267, AL121944, Z98745, AL021808, and AL021807.]

Download Full-text

EdClust: A heuristic sequence clustering method with higher sensitivity

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720021500360 ◽

2021 ◽

Author(s):

Ming Cao ◽

Qinke Peng ◽

Ze-Gang Wei ◽

Fei Liu ◽

Yi-Fan Hou

Keyword(s):

Large Scale ◽

Sequence Data ◽

Clustering Algorithms ◽

Clustering Methods ◽

Sequencing Data ◽

Clustering Method ◽

Cluster Number ◽

Sequence Clustering ◽

Downstream Analysis ◽

Heuristic Clustering

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.

Download Full-text

Retroposed New Genes Out of the X in Drosophila

Genome Research ◽

10.1101/gr.604902 ◽

2002 ◽

Vol 12 (12) ◽

pp. 1854-1859

Author(s):

Esther Betrán ◽

Kevin Thornton ◽

Manyuan Long

Keyword(s):

Population Genetics ◽

Molecular Mechanisms ◽

Sequence Data ◽

Evolutionary Process ◽

Significant Excess ◽

Link Type ◽

New Genes ◽

Asymmetric Pattern ◽

Unpublished Information

New genes that originated by various molecular mechanisms are an essential component in understanding the evolution of genetic systems. We investigated the pattern of origin of the genes created by retroposition in Drosophila. We surveyed the wholeDrosophila melanogaster genome for such new retrogenes and experimentally analyzed their functionality and evolutionary process. These retrogenes, functional as revealed by the analysis of expression, substitution, and population genetics, show a surprisingly asymmetric pattern in their origin. There is a significant excess of retrogenes that originate from the X chromosome and retropose to autosomes; new genes retroposed from autosomes are scarce. Further, we found that most of these X-derived autosomal retrogenes had evolved a testis expression pattern. These observations may be explained by natural selection favoring those new retrogenes that moved to autosomes and avoided the spermatogenesis X inactivation, and suggest the important role of genome position for the origin of new genes.[The sequence data from this study have been submitted to GenBank under accession nos. AY150701–AY150797. The following individuals kindly provided reagents, samples, or unpublished information as indicated in the paper: M.-L. Wu, F. Lemeunier, and P. Gibert.]

Download Full-text

NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy

GigaScience ◽

10.1093/gigascience/giaa105 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 1

Author(s):

Willem de Koning ◽

Milad Miladi ◽

Saskia Hiltemann ◽

Astrid Heikema ◽

John P Hays ◽

...

Keyword(s):

Genome Assembly ◽

Bioinformatics Analysis ◽

De Novo ◽

Sequence Data ◽

Ease Of Use ◽

Easy Access ◽

Complex Data ◽

Sequencing Data ◽

Long Read ◽

Sequencing Platforms

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.

Download Full-text