scholarly journals Who’s who? Detecting and resolving sample anomalies in human DNA sequencing studies with peddy

2016 ◽  
Author(s):  
Brent S. Pedersen ◽  
Aaron R. Quinlan

AbstractThe potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from the cohort are mislabelled, swapped, contaminated, or include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs or scientists in the process of sequencing. We have developed peddy to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from each individual’s genotypes. Peddy predicts a sample’s ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy’s speed, text reports and web interface facilitate both automated and visual detection of sample swaps, poor sequencing quality and other indicators of sample problems that, were they left undetected, would inhibit discovery.Software Availabilityhttps://github.com/brentp/peddyDemonstration (Chrome suggested)http://home.chpc.utah.edu/∼u6000771//plots/ceph1463.html

2016 ◽  
Author(s):  
Bohdan B. Khomtchouk ◽  
James R. Hennessy ◽  
Claes Wahlestedt

AbstractBackgroundTranscriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps.ResultsWe propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105 − 107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed.Conclusionsshinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.


2021 ◽  
Author(s):  
Marcelo Cicconet

AbstractThe Python Bioimage Computing Toolkit (PuBliCiTy) is an evolving set of functions, scripts, and classes, written primarily in Python, to facilitate the analysis of biological images, of two or more dimensions, from electron or light microscopes. While the early development was guided by the goal of replacing an existing internal code-base with Python code, the effort later came to include novel tools, specially in the areas of machine learning infrastructure and model development. The toolkit is built on top of the so-called python data science stack, which includes numpy, scipy, scikit-image, scikit-learn, and pandas. It also contains some deep learning models, written in TensorFlow and PyTorch, and a web-app for image annotation, which uses Flask as the web framework. The main features of the toolkit are: (1) simplifying the interface of some routinely used functions from underlying libraries; (2) providing helpful tools for the analysis of large images; (3) providing a web interface for image annotation, which can be used remotely and on tablets with pencils; (4) providing machine learning model implementations that are easy to read, train, and deploy – written in a way that minimizes complexity for users without a computer science or software development background. The source code is released under an MIT-like license at github.com/hms-idac/PuBliCiTy. Details, tutorials, and up-to-date documentation can be found at the project’s page as well.Project pagegithub.com/hms-idac/PuBliCiTy


Author(s):  
Frédéric Lemoine ◽  
Luc Blassel ◽  
Jakub Voznica ◽  
Olivier Gascuel

AbstractMotivationThe first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data.ResultshCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires less than 20mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels).Availabilityhttps://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/[email protected], [email protected] informationSupplementary information is available at Bioinformatics online.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 858 ◽  
Author(s):  
Krehenwinkel ◽  
Pomerantz ◽  
Prost

We live in an era of unprecedented biodiversity loss, affecting the taxonomic composition of ecosystems worldwide. The immense task of quantifying human imprints on global ecosystems has been greatly simplified by developments in high-throughput DNA sequencing technology (HTS). Approaches like DNA metabarcoding enable the study of biological communities at unparalleled detail. However, current protocols for HTS-based biodiversity exploration have several drawbacks. They are usually based on short sequences, with limited taxonomic and phylogenetic information content. Access to expensive HTS technology is often restricted in developing countries. Ecosystems of particular conservation priority are often remote and hard to access, requiring extensive time from field collection to laboratory processing of specimens. The advent of inexpensive mobile laboratory and DNA sequencing technologies show great promise to facilitate monitoring projects in biodiversity hot-spots around the world. Recent attention has been given to portable DNA sequencing studies related to infectious organisms, such as bacteria and viruses, yet relatively few studies have focused on applying these tools to Eukaryotes, such as plants and animals. Here, we outline the current state of genetic biodiversity monitoring of higher Eukaryotes using Oxford Nanopore Technology’s MinION portable sequencing platform, as well as summarize areas of recent development.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Abu Sayed Chowdhury ◽  
Sarah M. Reehl ◽  
Kylene Kehn-Hall ◽  
Barney Bishop ◽  
Bobbie-Jo M. Webb-Robertson

Abstract The emergence of viral epidemics throughout the world is of concern due to the scarcity of available effective antiviral therapeutics. The discovery of new antiviral therapies is imperative to address this challenge, and antiviral peptides (AVPs) represent a valuable resource for the development of novel therapies to combat viral infection. We present a new machine learning model to distinguish AVPs from non-AVPs using the most informative features derived from the physicochemical and structural properties of their amino acid sequences. To focus on those features that are most likely to contribute to antiviral performance, we filter potential features based on their importance for classification. These feature selection analyses suggest that secondary structure is the most important peptide sequence feature for predicting AVPs. Our Feature-Informed Reduced Machine Learning for Antiviral Peptide Prediction (FIRM-AVP) approach achieves a higher accuracy than either the model with all features or current state-of-the-art single classifiers. Understanding the features that are associated with AVP activity is a core need to identify and design new AVPs in novel systems. The FIRM-AVP code and standalone software package are available at https://github.com/pmartR/FIRM-AVP with an accompanying web application at https://msc-viz.emsl.pnnl.gov/AVPR.


2013 ◽  
Vol 63 (Pt_8) ◽  
pp. 2782-2786 ◽  
Author(s):  
Kazuko Takada ◽  
Masanori Saito ◽  
Osamu Tsudukibashi ◽  
Takachika Hiroi ◽  
Masatomo Hirasawa

Four Gram-positive, catalase-negative, coccoid isolates that were obtained from donkey oral cavities formed two distinct clonal groups when characterized by phenotypic and phylogenetic studies. From the results of biochemical tests, the organisms were tentatively identified as a streptococcal species. Comparative 16S rRNA gene sequencing studies confirmed the organisms to be members of the genus Streptococcus . Two of the isolates were related most closely to Streptococcus ursoris with 95.6 % similarity based on the 16S rRNA gene and to Streptococcus ratti with 92.0 % similarity based on the 60 kDa heat-shock protein gene (groEL). The other two isolates, however, were related to Streptococcus criceti with 95.0 and 89.0 % similarities based on the 16S rRNA and groEL genes, respectively. From both phylogenetic and phenotypic evidence, the four isolates formed two distinct clonal groups and are suggested to represent novel species of the genus Streptococcus . The names proposed for these organisms are Streptococcus orisasini sp. nov. (type strain NUM 1801T = JCM 17942T = DSM 25193T) and Streptococcus dentasini sp. nov. (type strain NUM 1808T = JCM 17943T = DSM 25137T).


2003 ◽  
Vol 49 (8) ◽  
pp. 1297-1308 ◽  
Author(s):  
Tina Yen ◽  
Brian N Nightingale ◽  
Jennifer C Burns ◽  
David R Sullivan ◽  
Peter M Stewart

Abstract Background: Measurement of plasma butyrylcholinesterase (BChE) activity and inhibitor-based phenotyping are standard methods for identifying patients who experience post-succinylcholine (SC) apnea attributable to inherited variants of the BChE enzyme. Our aim was to develop PCR-based assays for BCHE mutation detection and implement them for routine diagnostic use at a university teaching hospital. Methods: Between 1999 and 2002, we genotyped 65 patients referred after prolonged post-SC apnea. Five BCHE gene mutations were analyzed. Competitive oligo-priming (COP)-PCR was used for flu-1, flu-2, and K-variant and direct DNA sequencing analysis for dibucaine and sil-1 mutations. Additional DNA sequencing of BCHE coding regions was provided when the five-mutation screen was negative or mutation findings were inconsistent with enzyme activity. Results: Genotyping identified 52 patients with primary hypocholinesterasemia attributable to BCHE mutations, and in 44 individuals the abnormalities were detected by the five-mutation screen (detection rate, 85%). Additional sequencing studies revealed mutations in eight other patients, including five with novel mutations. The most common genotype abnormality was compound homozygous dibucaine and homozygous K-variant mutations. No simple homozygotes were found. Of the remaining 13 patients, 3 had normal BChE activity and gene, and 10 were diagnosed with hypocholinesterasemia unrelated to BCHE gene abnormalities. Conclusion: A five-mutation screen for investigation of post-SC apnea identified BCHE gene abnormalities for 80% of a referral population. Six new BCHE mutations were identified by sequencing studies of 16 additional patients.


2014 ◽  
Vol 64 (Pt_11) ◽  
pp. 3816-3820 ◽  
Author(s):  
Hyo-Jin Lee ◽  
Geon-Yeong Cho ◽  
Sang-Ho Chung ◽  
Kyung-Sook Whang

A Gram-staining-positive actinobacterium, designated strain 1MR-8T, was isolated from the rhizoplane of ginseng and its taxonomic status was determined using a polyphasic approach. The isolate formed long chains of spores that were straight, cylindrical and smooth-surfaced. Strain 1MR-8T grew at 10–37 °C (optimum 28 °C), whilst no growth was observed at 45 °C. The pH range for growth was 4.0–11.0 (optimum pH 6.0–8.0) and the NaCl range for growth was 0–7 % (w/v) with optimum growth at 1 % (w/v). Strain 1MR-8T had cell-wall peptidoglycans based on ll-diaminopimelic acid. Glucose, mannose and ribose were the whole-cell sugars. The predominant isoprenoid quinones were MK-9 (H4), MK-9 (H6) and MK-9 (H8) and the major fatty acids were anteiso-C15 : 0, iso-C15 : 0, anteiso-C17 : 0 and iso-C16 : 0. 16S rRNA gene sequencing studies showed that the novel strain was closely related to the type strains of Streptomyces caeruleatus GIMN4T, Streptomyces curacoi NRRL B-2901T, Streptomyces capoamus JCM 4734T and Streptomyces coeruleorubidus NBRC 12761T with similarities of 98.8 %. However, DNA–DNA relatedness, as well as physiological and biochemical analyses, showed that strain 1MR-8T could be differentiated from its closest phylogenetic relatives. It is proposed that this strain should be classified as a representative of a novel species of the genus Streptomyces , with the suggested name Streptomyces panaciradicis sp. nov. The type strain is 1MR-8T ( = KACC 17632T = NBRC 109811T).


Sign in / Sign up

Export Citation Format

Share Document