OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants

Mapping Intimacies ◽

10.1101/311654 ◽

2018 ◽

Author(s):

Imane Boudellioua ◽

Maxat Kulmanov ◽

Paul N Schofield ◽

Georgios V Gkoutos ◽

Robert Hoehndorf

Keyword(s):

Detection Methods ◽

Whole Genome ◽

Variant Prioritization ◽

Genome Sequences ◽

Mendelian Disorders ◽

Mendelian Diseases ◽

Whole Exome ◽

Whole Genomes ◽

Improved Performance ◽

Selection Of

ABSTRACTPurposeAn increasing number of Mendelian disorders have been identified for which two or more variants in one or more genes are required to cause the disease, or significantly modify its severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of variants underlying oligogenic diseases in individual whole exome or whole genome sequences.MethodsInformation that links patient phenotypes to databases of gene–phenotype associations observed in clinical research can provide useful information and improve variant prioritization for Mendelian diseases. Additionally, background knowledge about interactions between genes can be utilized to guide and restrict the selection of candidate disease modules.ResultsWe developed OligoPVP, an algorithm that can be used to identify variants in oligogenic diseases and their interactions, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods.ConclusionsOur results show that OligoPVP can efficiently detect oligogenic interactions using a phenotype-driven approach and identify etiologically important variants in whole genomes.

Download Full-text

VPMBench: a test bench for variant prioritization methods

BMC Bioinformatics ◽

10.1186/s12859-021-04458-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Andreas Ruscheinski ◽

Anna Lena Reimler ◽

Roland Ewald ◽

Adelinde M. Uhrmacher

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Test Bench ◽

Clinical Diagnostics ◽

Tool Support ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Variant Prioritization ◽

Whole Exome

Abstract Background Clinical diagnostics of whole-exome and whole-genome sequencing data requires geneticists to consider thousands of genetic variants for each patient. Various variant prioritization methods have been developed over the last years to aid clinicians in identifying variants that are likely disease-causing. Each time a new method is developed, its effectiveness must be evaluated and compared to other approaches based on the most recently available evaluation data. Doing so in an unbiased, systematic, and replicable manner requires significant effort. Results The open-source test bench “VPMBench” automates the evaluation of variant prioritization methods. VPMBench introduces a standardized interface for prioritization methods and provides a plugin system that makes it easy to evaluate new methods. It supports different input data formats and custom output data preparation. VPMBench exploits declaratively specified information about the methods, e.g., the variants supported by the methods. Plugins may also be provided in a technology-agnostic manner via containerization. Conclusions VPMBench significantly simplifies the evaluation of both custom and published variant prioritization methods. As we expect variant prioritization methods to become ever more critical with the advent of whole-genome sequencing in clinical diagnostics, such tool support is crucial to facilitate methodological research.

Download Full-text

The Ever-Expanding Pseudomonas Genus: Description of 43 New Species and Partition of the Pseudomonas Putida Group

10.20944/preprints202107.0335.v1 ◽

2021 ◽

Author(s):

Léa Girard ◽

Cédric Lood ◽

Monica Höfte ◽

Peter Vandamme ◽

Hassan Rokni-Zadeh ◽

...

Keyword(s):

Genetic Diversity ◽

Whole Genome ◽

Genome Sequences ◽

Taxonomic Assignment ◽

Metabolic Potential ◽

Rpod Gene ◽

Integrative Studies ◽

Pseudomonas Species ◽

Type Strains ◽

Selection Of

The genus Pseudomonas hosts an extensive genetic diversity and is one of the largest genera among Gram-negative bacteria. Type strains of Pseudomonas are well-known to represent only a small fraction of this diversity and the number of available Pseudomonas genome sequences is increasing rapidly. Consequently, new Pseudomonas species are regularly reported and the number of species within the genus is in constant evolution. In this study, whole genome se-quencing enabled us to define 43 new Pseudomonas species and to provide an update of the Pseu-domonas evolutionary and taxonomic relationships. Phylogenies based on the rpoD gene and whole genome sequences, including 316 and 313 type strains of Pseudomonas, respectively, re-vealed sixteen groups of Pseudomonas and justified the partitioning of the P. putida group into fifteen subgroups. Pairwise average nucleotide identities were calculated between type strains and a selection of 60 genomes of non-type strains of Pseudomonas. Forty-one strains were incor-rectly assigned at the species level and among those, 19 strains were shown to represent an addi-tional 13 new Pseudomonas species that remain to be formally classified. This work pinpoints the importance of correct taxonomic assignment and phylogenetic classification in order to perform integrative studies linking genetic diversity, lifestyle and metabolic potential of Pseudomonas spp.

Download Full-text

Complete genome sequence and analysis of nine Egyptian females with clinical information from different geographic regions in Egypt

10.1101/2020.03.10.985317 ◽

2020 ◽

Author(s):

Mahmoud ElHefnawi ◽

Elsayed Hegazy ◽

Asmaa ElFiky ◽

Yeonsu Jeon ◽

Sungwon Jeon ◽

...

Keyword(s):

Middle Eastern ◽

Clinical Information ◽

Human Migration ◽

Progressive External Ophthalmoplegia ◽

Whole Genome ◽

Mtdna Mutation ◽

Genome Sequences ◽

Whole Genomes ◽

Genetic And Environmental Factors ◽

Genomic Resource

AbstractEgyptians are at a crossroad between Africa and Eurasia, providing useful genomic resources for analyzing both genetic and environmental factors for future personalized medicine. Two personal Egyptian whole genomes have been published previously and here nine female whole genome sequences with clinical information have been added to expand the genomic resource of Egyptian personal genomes. Here we report the analysis of whole genomes of nine Egyptian females from different regions using Illumina short-read sequencers. At 30x sequencing coverage, we identified 12 SNPs that were shared in most of the subjects associated with obesity which are concordant with their clinical diagnosis. Also, we found mtDNA mutation A4282G is common in all the samples and this is associated with chronic progressive external ophthalmoplegia (CPEO). Haplogroup and Admixture analyses revealed that most Egyptian samples are close to the other north Mediterranean, Middle Eastern, and European, respectively, possibly reflecting the into-Africa influx of human migration. In conclusion, we present whole-genome sequences of nine Egyptian females with personal clinical information that cover the diverse regions of Egypt. Although limited in sample size, the whole genomes data provides possible geno-phenotype candidate markers that are relevant to the region’s diseases.

Download Full-text

Comparison of aggregation methods for multiphenotype exomic variant prioritization

10.1101/064899 ◽

2016 ◽

Author(s):

Alejandro Sifrim ◽

Dusan Popovic ◽

Joris R. Vermeesch ◽

Jan Aerts ◽

Bart De Moor ◽

...

Keyword(s):

Protein Function ◽

Parametric Modeling ◽

Variant Prioritization ◽

Aggregation Method ◽

Mendelian Disorders ◽

Aggregation Methods ◽

Potential Impact ◽

Available Information ◽

Population Databases ◽

Selection Of

AbstractThe identification of disease-causing genes in Mendelian disorders has been facilitated by the detection of rare disease-causing variation through exome sequencing experiments. These studies rely on population databases to filter a majority of the putatively neutral variation in the genome and additional filtering steps using either cohorts of diseased individuals or familial information to narrow down the list of candidate variants. Recently, new computational methods have been proposed to prioritize variants by scoring them not only based on their potential impact on protein function but also on their relevance given the available information on the disease under study. Usually these diseases comprise several phenotypic presentations, which are separately prioritized and then aggregated into a global score. In this study we compare several simple (e.g. maximum and mean score) and more complex aggregation methods (e.g. order statistics, parametric modeling) in order to obtain the best possible prioritization performance. We show that all methods perform reasonably well (median rank below 20 out of more than 8000 variants) and that the selection of an optimal aggregation method depends strongly on the fraction of uninformative phenotypes. Finally, we propose guidelines as to how to select an appropriate aggregation method based on knowledge of the phenotype under study.

Download Full-text

Retention and Loss of Amino Acid Biosynthetic Pathways Based on Analysis of Whole-Genome Sequences

Eukaryotic Cell ◽

10.1128/ec.5.2.272-276.2006 ◽

2006 ◽

Vol 5 (2) ◽

pp. 272-276 ◽

Cited By ~ 81

Author(s):

Samuel H. Payne ◽

William F. Loomis

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Minimal Medium ◽

Whole Genome ◽

Biosynthetic Pathways ◽

Genome Sequences ◽

Whole Genomes ◽

Complete Genomes ◽

Computational Analyses

ABSTRACT Plants and fungi can synthesize each of the 20 amino acids by using biosynthetic pathways inherited from their bacterial ancestors. However, the ability to synthesize nine amino acids (Phe, Trp, Ile, Leu, Val, Lys, His, Thr, and Met) was lost in a wide variety of eukaryotes that evolved the ability to feed on other organisms. Since the biosynthetic pathways and their respective enzymes are well characterized, orthologs can be recognized in whole genomes to understand when in evolution pathways were lost. The pattern of pathway loss and retention was analyzed in the complete genomes of three early-diverging protist parasites, the amoeba Dictyostelium, and six animals. The nine pathways were lost independently in animals, Dictyostelium, Leishmania, Plasmodium, and Cryptosporidium. Seven additional pathways appear to have been lost in one or another parasite, demonstrating that they are dispensable in a nutrition-rich environment. Our predictions of pathways retained and pathways lost based on computational analyses of whole genomes are validated by minimal-medium studies with mammals, fish, worms, and Dictyostelium. The apparent selective advantages of retaining biosynthetic capabilities for amino acids available in the diet are considered.

Download Full-text

Framework for quality assessment of whole genome, cancer sequences

10.1101/140921 ◽

2017 ◽

Cited By ~ 5

Author(s):

Justin P. Whalley ◽

Ivo Buchhalter ◽

Esther Rheinbay ◽

Keiran M. Raine ◽

Kortine Kleinheinz ◽

...

Keyword(s):

Quality Measures ◽

Poor Quality ◽

Whole Genome ◽

Genome Sequences ◽

International Cancer Genome Consortium ◽

Whole Genomes ◽

Sequencing Quality ◽

Genome Consortium ◽

Pan Cancer

AbstractWorking with cancer whole genomes sequenced over a period of many years in different sequencing centres requires a validated framework to compare the quality of these sequences. The Pan-Cancer Analysis of Whole Genomes (PCAWG) of the International Cancer Genome Consortium (ICGC), a project a cohort of over 2800 donors provided us with the challenge of assessing the quality of the genome sequences. A non-redundant set of five quality control (QC) measurements were assembled and used to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses of these whole genome sequences. The resulting QC measures also allowed for exclusion samples of poor quality, providing researchers within PCAWG, and when the data is released for other researchers, a good idea of the sequencing quality. For a researcher wishing to apply the QC measures for their data we provide a Docker Container of the software used to calculate them. We believe that this is an effective framework of quality measures for whole genome, cancer sequences, which will be a useful addition to analytical pipelines, as it has to the PCAWG project.

Download Full-text

Verification of genetic engineering in yeasts with nanopore whole genome sequencing

10.1101/2020.05.05.079368 ◽

2020 ◽

Author(s):

Joseph H. Collins ◽

Kevin W. Keating ◽

Trent R. Jones ◽

Shravani Balaji ◽

Celeste B. Marsan ◽

...

Keyword(s):

Quality Control ◽

Synthetic Biology ◽

Genome Assembly ◽

Whole Genome ◽

Sequencing Data ◽

Genome Sequences ◽

Yeast Strains ◽

Whole Genomes ◽

Before And After ◽

Integrated Pathway

ABSTRACTYeast genomes can be assembled from sequencing data, but genome integrations and episomal plasmids often fail to be resolved with accuracy, completeness, and contiguity. Resolution of these features is critical for many synthetic biology applications, including strain quality control and identifying engineering in unknown samples. Here, we report an integrated workflow, named Prymetime, that uses sequencing reads from inexpensive NGS platforms, assembly and error correction software, and a list of synthetic biology parts to achieve accurate whole genome sequences of yeasts with engineering annotated. To build the workflow, we first determined which sequencing methods and software packages returned an accurate, complete, and contiguous genome of an engineered S. cerevisiae strain with two similar plasmids and an integrated pathway. We then developed a sequence feature annotation step that labels synthetic biology parts from a standard list of yeast engineering sequences or from a custom sequence list. We validated the workflow by sequencing a collection of 15 engineered yeasts built from different parent S. cerevisiae and nonconventional yeast strains. We show that each integrated pathway and episomal plasmid can be correctly assembled and annotated, even in strains that have part repeats and multiple similar plasmids. Interestingly, Prymetime was able to identify deletions and unintended integrations that were subsequently confirmed by other methods. Furthermore, the whole genomes are accurate, complete, and contiguous. To illustrate this clearly, we used a publicly available S. cerevisiae CEN.PK113 reference genome and the accompanying reads to show that a Prymetime genome assembly is equivalent to the reference using several standard metrics. Finally, we used Prymetime to resequence the nonconventional yeasts Y. lipolytica Po1f and K. phaffii CBS 7435, producing an improved genome assembly for each strain. Thus, our workflow can achieve accurate, complete, and contiguous whole genome sequences of yeast strains before and after engineering. Therefore, Prymetime enables NGS-based strain quality control through assembly and identification of engineering features.

Download Full-text