Who’s who? Detecting and resolving sample anomalies in human DNA sequencing studies with peddy

Mapping Intimacies ◽

10.1101/074385 ◽

2016 ◽

Author(s):

Brent S. Pedersen ◽

Aaron R. Quinlan

Keyword(s):

Dna Sequencing ◽

Web Interface ◽

1000 Genomes ◽

Link Type ◽

Human Dna ◽

Machine Learning Model ◽

Sequencing Studies ◽

Sequencing Quality ◽

Interactive Visualizations ◽

Genetic Discovery

AbstractThe potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from the cohort are mislabelled, swapped, contaminated, or include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs or scientists in the process of sequencing. We have developed peddy to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from each individual’s genotypes. Peddy predicts a sample’s ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy’s speed, text reports and web interface facilitate both automated and visual detection of sample swaps, poor sequencing quality and other indicators of sample problems that, were they left undetected, would inhibit discovery.Software Availabilityhttps://github.com/brentp/peddyDemonstration (Chrome suggested)http://home.chpc.utah.edu/∼u6000771//plots/ceph1463.html

Download Full-text

shinyheatmap: ultra fast low memory heatmap web interface for big data genomics

10.1101/076463 ◽

2016 ◽

Author(s):

Bohdan B. Khomtchouk ◽

James R. Hennessy ◽

Claes Wahlestedt

Keyword(s):

Big Data ◽

High Performance ◽

Numerical Data ◽

Large Datasets ◽

Barriers To Entry ◽

Web Interface ◽

Link Type ◽

Sequencing Studies ◽

Performance Benchmarks ◽

Computational Resources

AbstractBackgroundTranscriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps.ResultsWe propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105 − 107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed.Conclusionsshinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.

Download Full-text

PuBliCiTy: Python Bioimage Computing Toolkit

10.1101/2021.03.01.432926 ◽

2021 ◽

Author(s):

Marcelo Cicconet

Keyword(s):

Machine Learning ◽

Data Science ◽

Image Annotation ◽

Model Development ◽

Web Interface ◽

Biological Images ◽

Link Type ◽

Machine Learning Model ◽

Code Base ◽

Web App

AbstractThe Python Bioimage Computing Toolkit (PuBliCiTy) is an evolving set of functions, scripts, and classes, written primarily in Python, to facilitate the analysis of biological images, of two or more dimensions, from electron or light microscopes. While the early development was guided by the goal of replacing an existing internal code-base with Python code, the effort later came to include novel tools, specially in the areas of machine learning infrastructure and model development. The toolkit is built on top of the so-called python data science stack, which includes numpy, scipy, scikit-image, scikit-learn, and pandas. It also contains some deep learning models, written in TensorFlow and PyTorch, and a web-app for image annotation, which uses Flask as the web framework. The main features of the toolkit are: (1) simplifying the interface of some routinely used functions from underlying libraries; (2) providing helpful tools for the analysis of large images; (3) providing a web interface for image annotation, which can be used remotely and on tablets with pencils; (4) providing machine learning model implementations that are easy to read, train, and deploy – written in a way that minimizes complexity for users without a computer science or software development background. The source code is released under an MIT-like license at github.com/hms-idac/PuBliCiTy. Details, tutorials, and up-to-date documentation can be found at the project’s page as well.Project pagegithub.com/hms-idac/PuBliCiTy

Download Full-text

Who’s Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2017.01.017 ◽

2017 ◽

Vol 100 (3) ◽

pp. 406-413 ◽

Cited By ~ 47

Author(s):

Brent S. Pedersen ◽

Aaron R. Quinlan

Keyword(s):

Dna Sequencing ◽

Human Dna ◽

Sequencing Studies

Download Full-text

DNA Sequencing Studies by Capillary Electrophoresis

Encyclopedia of Chromatography ◽

10.1201/noe0824727857-101 ◽

2005 ◽

pp. 478-485

Author(s):

Feng Xu ◽

Yoshinobu Baba

Keyword(s):

Capillary Electrophoresis ◽

Dna Sequencing ◽

Sequencing Studies

Download Full-text

COVID-Align: Accurate online alignment of hCoV-19 genomes using a profile HMM

10.1101/2020.05.25.114884 ◽

2020 ◽

Cited By ~ 2

Author(s):

Frédéric Lemoine ◽

Luc Blassel ◽

Jakub Voznica ◽

Olivier Gascuel

Keyword(s):

Daily Basis ◽

Supplementary Information ◽

Summary Statistics ◽

Evolutionary Novelty ◽

Bioinformatics Analyses ◽

Link Type ◽

Sequencing Quality ◽

User Friendly ◽

Profile Hmm ◽

New Mutations

AbstractMotivationThe first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data.ResultshCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires less than 20mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels).Availabilityhttps://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/[email protected], [email protected] informationSupplementary information is available at Bioinformatics online.

Download Full-text

Genetic Biomonitoring and Biodiversity Assessment Using Portable Sequencing Technologies: Current Uses and Future Directions

Genes ◽

10.3390/genes10110858 ◽

2019 ◽

Vol 10 (11) ◽

pp. 858 ◽

Cited By ~ 18

Author(s):

Krehenwinkel ◽

Pomerantz ◽

Prost

Keyword(s):

Dna Sequencing ◽

Biodiversity Loss ◽

Taxonomic Composition ◽

Great Promise ◽

Sequencing Platform ◽

Biological Communities ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Sequencing Studies ◽

High Throughput Dna Sequencing

We live in an era of unprecedented biodiversity loss, affecting the taxonomic composition of ecosystems worldwide. The immense task of quantifying human imprints on global ecosystems has been greatly simplified by developments in high-throughput DNA sequencing technology (HTS). Approaches like DNA metabarcoding enable the study of biological communities at unparalleled detail. However, current protocols for HTS-based biodiversity exploration have several drawbacks. They are usually based on short sequences, with limited taxonomic and phylogenetic information content. Access to expensive HTS technology is often restricted in developing countries. Ecosystems of particular conservation priority are often remote and hard to access, requiring extensive time from field collection to laboratory processing of specimens. The advent of inexpensive mobile laboratory and DNA sequencing technologies show great promise to facilitate monitoring projects in biodiversity hot-spots around the world. Recent attention has been given to portable DNA sequencing studies related to infectious organisms, such as bacteria and viruses, yet relatively few studies have focused on applying these tools to Eukaryotes, such as plants and animals. Here, we outline the current state of genetic biodiversity monitoring of higher Eukaryotes using Oxford Nanopore Technology’s MinION portable sequencing platform, as well as summarize areas of recent development.

Download Full-text

Better understanding and prediction of antiviral peptides through primary and secondary structure feature importance

Scientific Reports ◽

10.1038/s41598-020-76161-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Abu Sayed Chowdhury ◽

Sarah M. Reehl ◽

Kylene Kehn-Hall ◽

Barney Bishop ◽

Bobbie-Jo M. Webb-Robertson

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Web Application ◽

Amino Acid Sequences ◽

Peptide Sequence ◽

Antiviral Therapies ◽

Link Type ◽

Machine Learning Model ◽

Antiviral Peptides ◽

Peptide Prediction

Abstract The emergence of viral epidemics throughout the world is of concern due to the scarcity of available effective antiviral therapeutics. The discovery of new antiviral therapies is imperative to address this challenge, and antiviral peptides (AVPs) represent a valuable resource for the development of novel therapies to combat viral infection. We present a new machine learning model to distinguish AVPs from non-AVPs using the most informative features derived from the physicochemical and structural properties of their amino acid sequences. To focus on those features that are most likely to contribute to antiviral performance, we filter potential features based on their importance for classification. These feature selection analyses suggest that secondary structure is the most important peptide sequence feature for predicting AVPs. Our Feature-Informed Reduced Machine Learning for Antiviral Peptide Prediction (FIRM-AVP) approach achieves a higher accuracy than either the model with all features or current state-of-the-art single classifiers. Understanding the features that are associated with AVP activity is a core need to identify and design new AVPs in novel systems. The FIRM-AVP code and standalone software package are available at https://github.com/pmartR/FIRM-AVP with an accompanying web application at https://msc-viz.emsl.pnnl.gov/AVPR.

Download Full-text

Streptococcus orisasini sp. nov. and Streptococcus dentasini sp. nov., isolated from the oral cavity of donkeys

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.047142-0 ◽

2013 ◽

Vol 63 (Pt_8) ◽

pp. 2782-2786 ◽

Cited By ~ 20

Author(s):

Kazuko Takada ◽

Masanori Saito ◽

Osamu Tsudukibashi ◽

Takachika Hiroi ◽

Masatomo Hirasawa

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Type Strain ◽

Type Species ◽

16S Rrna Gene Sequencing ◽

Rrna Gene ◽

Biochemical Tests ◽

Content Type ◽

Link Type ◽

Sequencing Studies

Four Gram-positive, catalase-negative, coccoid isolates that were obtained from donkey oral cavities formed two distinct clonal groups when characterized by phenotypic and phylogenetic studies. From the results of biochemical tests, the organisms were tentatively identified as a streptococcal species. Comparative 16S rRNA gene sequencing studies confirmed the organisms to be members of the genus Streptococcus . Two of the isolates were related most closely to Streptococcus ursoris with 95.6 % similarity based on the 16S rRNA gene and to Streptococcus ratti with 92.0 % similarity based on the 60 kDa heat-shock protein gene (groEL). The other two isolates, however, were related to Streptococcus criceti with 95.0 and 89.0 % similarities based on the 16S rRNA and groEL genes, respectively. From both phylogenetic and phenotypic evidence, the four isolates formed two distinct clonal groups and are suggested to represent novel species of the genus Streptococcus . The names proposed for these organisms are Streptococcus orisasini sp. nov. (type strain NUM 1801T = JCM 17942T = DSM 25193T) and Streptococcus dentasini sp. nov. (type strain NUM 1808T = JCM 17943T = DSM 25137T).

Download Full-text

Butyrylcholinesterase (BCHE) Genotyping for Post-Succinylcholine Apnea in an Australian Population

Clinical Chemistry ◽

10.1373/49.8.1297 ◽

2003 ◽

Vol 49 (8) ◽

pp. 1297-1308 ◽

Cited By ~ 53

Author(s):

Tina Yen ◽

Brian N Nightingale ◽

Jennifer C Burns ◽

David R Sullivan ◽

Peter M Stewart

Keyword(s):

Dna Sequencing ◽

Gene Mutations ◽

University Teaching ◽

Sequencing Analysis ◽

Direct Dna Sequencing ◽

Bche Activity ◽

Mutation Screen ◽

Sequencing Studies ◽

Dna Sequencing Analysis ◽

K Variant

Abstract Background: Measurement of plasma butyrylcholinesterase (BChE) activity and inhibitor-based phenotyping are standard methods for identifying patients who experience post-succinylcholine (SC) apnea attributable to inherited variants of the BChE enzyme. Our aim was to develop PCR-based assays for BCHE mutation detection and implement them for routine diagnostic use at a university teaching hospital. Methods: Between 1999 and 2002, we genotyped 65 patients referred after prolonged post-SC apnea. Five BCHE gene mutations were analyzed. Competitive oligo-priming (COP)-PCR was used for flu-1, flu-2, and K-variant and direct DNA sequencing analysis for dibucaine and sil-1 mutations. Additional DNA sequencing of BCHE coding regions was provided when the five-mutation screen was negative or mutation findings were inconsistent with enzyme activity. Results: Genotyping identified 52 patients with primary hypocholinesterasemia attributable to BCHE mutations, and in 44 individuals the abnormalities were detected by the five-mutation screen (detection rate, 85%). Additional sequencing studies revealed mutations in eight other patients, including five with novel mutations. The most common genotype abnormality was compound homozygous dibucaine and homozygous K-variant mutations. No simple homozygotes were found. Of the remaining 13 patients, 3 had normal BChE activity and gene, and 10 were diagnosed with hypocholinesterasemia unrelated to BCHE gene abnormalities. Conclusion: A five-mutation screen for investigation of post-SC apnea identified BCHE gene abnormalities for 80% of a referral population. Six new BCHE mutations were identified by sequencing studies of 16 additional patients.

Download Full-text

Streptomyces panaciradicis sp. nov., a β-glucosidase-producing bacterium isolated from ginseng rhizoplane

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.061705-0 ◽

2014 ◽

Vol 64 (Pt_11) ◽

pp. 3816-3820 ◽

Cited By ~ 8

Author(s):

Hyo-Jin Lee ◽

Geon-Yeong Cho ◽

Sang-Ho Chung ◽

Kyung-Sook Whang

Keyword(s):

Type Species ◽

Taxonomic Status ◽

16S Rrna Gene Sequencing ◽

Diaminopimelic Acid ◽

Rrna Gene ◽

Content Type ◽

Link Type ◽

Gram Staining ◽

Sequencing Studies ◽

Biochemical Analyses

A Gram-staining-positive actinobacterium, designated strain 1MR-8T, was isolated from the rhizoplane of ginseng and its taxonomic status was determined using a polyphasic approach. The isolate formed long chains of spores that were straight, cylindrical and smooth-surfaced. Strain 1MR-8T grew at 10–37 °C (optimum 28 °C), whilst no growth was observed at 45 °C. The pH range for growth was 4.0–11.0 (optimum pH 6.0–8.0) and the NaCl range for growth was 0–7 % (w/v) with optimum growth at 1 % (w/v). Strain 1MR-8T had cell-wall peptidoglycans based on ll-diaminopimelic acid. Glucose, mannose and ribose were the whole-cell sugars. The predominant isoprenoid quinones were MK-9 (H4), MK-9 (H6) and MK-9 (H8) and the major fatty acids were anteiso-C15 : 0, iso-C15 : 0, anteiso-C17 : 0 and iso-C16 : 0. 16S rRNA gene sequencing studies showed that the novel strain was closely related to the type strains of Streptomyces caeruleatus GIMN4T, Streptomyces curacoi NRRL B-2901T, Streptomyces capoamus JCM 4734T and Streptomyces coeruleorubidus NBRC 12761T with similarities of 98.8 %. However, DNA–DNA relatedness, as well as physiological and biochemical analyses, showed that strain 1MR-8T could be differentiated from its closest phylogenetic relatives. It is proposed that this strain should be classified as a representative of a novel species of the genus Streptomyces , with the suggested name Streptomyces panaciradicis sp. nov. The type strain is 1MR-8T ( = KACC 17632T = NBRC 109811T).

Download Full-text