Phenotype Analysis of Retinal Dystrophies in Light of the Underlying Genetic Defects: Application to Cone and Cone-Rod Dystrophies

Elise Boulanger-Scemama; Saddek Mohand-Saïd; Said El Shamieh; Vanessa Démontant; Christel Condroyer; Aline Antonio; Christelle Michiels; Fiona Boyard; Jean-Paul Saraiva; Mélanie Letexier; José-Alain Sahel; Christina Zeitz; Isabelle Audo

doi:10.3390/ijms20194854

Introduction to radiomics and radiogenomics in neuro-oncology: implications and challenges

Neuro-Oncology Advances ◽

10.1093/noajnl/vdaa148 ◽

2020 ◽

Vol 2 (Supplement_4) ◽

pp. iv3-iv14

Author(s):

Niha Beig ◽

Kaustav Bera ◽

Pallavi Tiwari

Keyword(s):

Treatment Response ◽

A Priori ◽

Point Mutations ◽

Region Of Interest ◽

Next Generation Sequencing Data ◽

Tumor Segmentation ◽

Sequencing Data ◽

Statistical Correlations ◽

Data Driven Approach ◽

Mri Scans

Abstract Neuro-oncology largely consists of malignancies of the brain and central nervous system including both primary as well as metastatic tumors. Currently, a significant clinical challenge in neuro-oncology is to tailor therapies for patients based on a priori knowledge of their survival outcome or treatment response to conventional or experimental therapies. Radiomics or the quantitative extraction of subvisual data from conventional radiographic imaging has recently emerged as a powerful data-driven approach to offer insights into clinically relevant questions related to diagnosis, prediction, prognosis, as well as assessing treatment response. Furthermore, radiogenomic approaches provide a mechanism to establish statistical correlations of radiomic features with point mutations and next-generation sequencing data to further leverage the potential of routine MRI scans to serve as “virtual biopsy” maps. In this review, we provide an introduction to radiomic and radiogenomic approaches in neuro-oncology, including a brief description of the workflow involving preprocessing, tumor segmentation, and extraction of “hand-crafted” features from the segmented region of interest, as well as identifying radiogenomic associations that could ultimately lead to the development of reliable prognostic and predictive models in neuro-oncology applications. Lastly, we discuss the promise of radiomics and radiogenomic approaches in personalizing treatment decisions in neuro-oncology, as well as the challenges with clinical adoption, which will rely heavily on their demonstrated resilience to nonstandardization in imaging protocols across sites and scanners, as well as in their ability to demonstrate reproducibility across large multi-institutional cohorts.

Download Full-text

Inherited variants in CHD3 demonstrate variable expressivity in Snijders Blok-Campeau syndrome

10.1101/2021.10.04.21264162 ◽

2021 ◽

Author(s):

Jet van der Spek ◽

Joery den Hoed ◽

Lot Snijders Blok ◽

Alexander J. M. Dingemans ◽

Dick Schijven ◽

...

Keyword(s):

De Novo ◽

Neurodevelopmental Disorder ◽

Underlying Mechanism ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Human Phenotype ◽

Variable Expressivity ◽

Pathogenic Variants ◽

Reduced Penetrance ◽

Coding Variants

Interpretation of next-generation sequencing data of individuals with an apparent sporadic neurodevelopmental disorder (NDD) often focusses on pathogenic variants in genes associated with NDD, assuming full clinical penetrance with limited variable expressivity. Consequently, inherited variants in genes associated with dominant disorders may be overlooked when the transmitting parent is clinically unaffected. While de novo variants explain a substantial proportion of cases with NDDs, a significant number remains undiagnosed possibly explained by coding variants associated with reduced penetrance and variable expressivity. We characterized twenty families with inherited heterozygous missense or protein-truncating variants (PTVs) in CHD3, a gene in which de novo variants cause Snijders Blok-Campeau syndrome, characterized by intellectual disability, speech delay and recognizable facial features (SNIBCPS). Notably, the majority of the inherited CHD3 variants were maternally transmitted. Computational facial and human phenotype ontology-based comparisons demonstrated that the phenotypic features of probands with inherited CHD3 variants overlap with the phenotype previously associated with de novo variants in the gene, while carrier parents are mildly or not affected, suggesting variable expressivity. Additionally, similarly reduced expression levels of CHD3 protein in cells of an affected proband and of related healthy carriers with a CHD3 PTV, suggested that compensation of expression from the wildtype allele is unlikely to be an underlying mechanism. Our results point to a significant role of inherited variation in SNIBCPS, a finding that is critical for correct variant interpretation and genetic counseling and warrants further investigation towards understanding the broader contributions of such variation to the landscape of human disease.

Download Full-text

InteractomeSeq: a web server for the identification and profiling of domains and epitopes from phage display and next generation sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkaa363 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W200-W207

Author(s):

Simone Puccio ◽

Giorgio Grillo ◽

Arianna Consiglio ◽

Maria Felicia Soluri ◽

Daniele Sblattero ◽

...

Keyword(s):

Phage Display ◽

Large Scale ◽

High Throughput Sequencing ◽

Gene Annotation ◽

Web Server ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Phage Display Technology ◽

Essential Information ◽

Research Fields

Abstract High-Throughput Sequencing technologies are transforming many research fields, including the analysis of phage display libraries. The phage display technology coupled with deep sequencing was introduced more than a decade ago and holds the potential to circumvent the traditional laborious picking and testing of individual phage rescued clones. However, from a bioinformatics point of view, the analysis of this kind of data was always performed by adapting tools designed for other purposes, thus not considering the noise background typical of the ‘interactome sequencing’ approach and the heterogeneity of the data. InteractomeSeq is a web server allowing data analysis of protein domains (‘domainome’) or epitopes (‘epitome’) from either Eukaryotic or Prokaryotic genomic phage libraries generated and selected by following an Interactome sequencing approach. InteractomeSeq allows users to upload raw sequencing data and to obtain an accurate characterization of domainome/epitome profiles after setting the parameters required to tune the analysis. The release of this tool is relevant for the scientific and clinical community, because InteractomeSeq will fill an existing gap in the field of large-scale biomarkers profiling, reverse vaccinology, and structural/functional studies, thus contributing essential information for gene annotation or antigen identification. InteractomeSeq is freely available at https://InteractomeSeq.ba.itb.cnr.it/

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

Benchmarking Variant Identification Tools for Plant Diversity Discovery

10.21203/rs.2.9666/v2 ◽

2019 ◽

Author(s):

Xing Wu ◽

Christopher Heffelfinger ◽

Hongyu Zhao ◽

Stephen L. Dellaporta

Keyword(s):

Next Generation Sequencing ◽

High Throughput Sequencing ◽

Crop Improvement ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Variant Discovery ◽

Variant Filtering ◽

Generation Sequencing

Abstract Background The ability to accurately and comprehensively identify genomic variations is critical for plant studies utilizing high-throughput sequencing. Most bioinformatics tools for processing next-generation sequencing data were originally developed and tested in human studies, raising questions as to their efficacy for plant research. A detailed evaluation of the entire variant calling pipeline, including alignment, variant calling, variant filtering, and imputation was performed on different programs using both simulated and real plant genomic datasets. Results A comparison of SOAP2, Bowtie2, and BWA-MEM found that BWA-MEM was consistently able to align the most reads with high accuracy, whereas Bowtie2 had the highest overall accuracy. Comparative results of GATK HaplotypCaller versus SAMtools mpileup indicated that the choice of variant caller affected precision and recall differentially depending on the levels of diversity, sequence coverage and genome complexity. A cross-reference experiment of S. lycopersicum and S. pennellii reference genomes revealed the inadequacy of single reference genome for variant discovery that includes distantly-related plant individuals. Machine-learning-based variant filtering strategy outperformed the traditional hard-cutoff strategy resulting in higher number of true positive variants and fewer false positive variants. A 2-step imputation method, which utilized a set of high-confidence SNPs as the reference panel, showed up to 60% higher accuracy than direct LD-based imputation. Conclusions Programs in the variant discovery pipeline have different performance on plant genomic dataset. Choice of the programs is subjected to the goal of the study and available resources. This study serves as an important guiding information for plant biologists utilizing next-generation sequencing data for diversity characterization and crop improvement.

Download Full-text

PhageTerm: a Fast and User-friendly Software to Determine Bacteriophage Termini and Packaging Mode using randomly fragmented NGS data

10.1101/108100 ◽

2017 ◽

Cited By ~ 2

Author(s):

Julian Garneau ◽

Florence Depardieu ◽

Louis-Charles Fortier ◽

David Bikard ◽

Marc Monot

Keyword(s):

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Link Type ◽

Sequencing Technologies ◽

Statistical Framework ◽

Fastq Format ◽

Viral Particles ◽

User Friendly ◽

Ngs Data

ABSTRACTBacteriophages are the most abundant viruses on earth and display an impressive genetic as well as morphologic diversity. Among those, the most common order of phages is the Caudovirales, whose viral particles packages linear double stranded DNA (dsDNA). In this study we investigated how the information gathered by high throughput sequencing technologies can be used to determine the DNA termini and packaging mechanisms of dsDNA phages. The wet-lab procedures traditionally used for this purpose rely on the identification and cloning of restriction fragment which can be delicate and cumbersome. Here, we developed a theoretical and statistical framework to analyze DNA termini and phage packaging mechanisms using next-generation sequencing data. Our methods, implemented in the PhageTerm software, work with sequencing reads in fastq format and the corresponding assembled phage genome.PhageTerm was validated on a set of phages with well-established packaging mechanisms representative of the termini diversity: 5’cos (lambda), 3’cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of 9Clostridium difficilephages and 6 phages whose sequences where retrieved from the sequence read archive (SRA).A direct graphical interface is available as a Galaxy wrapper version athttps://galaxy.pasteur.frand a standalone version is accessible athttps://sourceforge.net/projects/phageterm/.

Download Full-text

STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data

10.1101/2021.06.13.448220 ◽

2021 ◽

Author(s):

Andreas Halman ◽

Egor Dolzhenko ◽

Alicia Oshlack

Keyword(s):

Genome Sequencing ◽

Fragment Length ◽

Short Tandem Repeats ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Causal Variant ◽

Sequencing Data ◽

Pathogenic Variants ◽

Set Up ◽

Short Tandem

AbstractShort tandem repeats (STRs) are highly polymorphic with high mutation rates and expansions of STRs have been implicated as the causal variant in diseases. The application of genome sequencing in patients has recently allowed many new discoveries with over 50 disease causing loci known to date. There are several tools which allow genotyping of STRs from high-throughput sequencing (HTS) data. However, running these tools out of the box only allow around half of the known disease-causing loci to be genotyped, with lengths often limited to either read or fragment length which is less than the pathogenic cut-off for some diseases. While analysis tools can be customised to genotype extra loci, this requires proficiency in bioinformatics to set up, use, and analyse the resulting data, limiting their widespread usage by other researchers and clinicians.To address these issues, we have created a new software called STRipy that has an intuitive graphical interface and requires no specific skills for usage, thus significantly simplifying detection of STRs expansions from human HTS data. STRipy is able to target all known disease-causing STRs with genotyping performed with an established tool, ExpansionHunter, that is incorporated into the software. We have created additional functionality into STRipy to work with long alleles exceeding the fragment length.STRipy was validated using over 60 thousand simulated samples and was shown to work on whole genome sequencing of biological samples with pathogenic variants. Finally, we have used STRipy to acquire genotypes of pathogenic loci for thousands of samples from various populations which are provided to the user along with the data from the literature to assist with results interpretation. We believe the simplicity and breadth of STRipy will increase the testing of STR diseases in current datasets resulting in further diagnoses of rare diseases caused by STRs expansions.

Download Full-text

ANGSD-wrapper: utilities for analyzing next generation sequencing data

10.7287/peerj.preprints.1472 ◽

2016 ◽

Author(s):

Arun Durvasula ◽

Paul J Hoffman ◽

Tyler V Kent ◽

Chaochih Liu ◽

Thomas J Y Kono ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Molecular Ecology ◽

Principal Component ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Genome Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

User Friendly

High throughput sequencing has changed many aspects of population genetics, molecular ecology, and related fields, affecting both experimental design and data analysis. The software package ANGSD allows users to perform a number of population genetic analyses on high-throughput sequencing data. ANGSD uses probabilistic approaches to calculate genome-wide descriptive statistics. The package makes use of genotype likelihood estimates rather than SNP calls and is specifically designed to produce more accurate results for samples with low sequencing depth. ANGSD makes use of full genome data while handling a wide array of sampling and experimental designs. Here we present ANGSD-wrapper, a set of wrapper scripts that provide a user-friendly interface for running ANGSD and visualizing results. ANGSD-wrapper supports multiple types of analyses including esti- mates of nucleotide sequence diversity and performing neutrality tests, principal component analysis, estimation of admixture proportions for individuals samples, and calculation of statistics that quantify recent introgression. ANGSD-wrapper also provides interactive graphing of ANGSD results to enhance data exploration. We demonstrate the usefulness of ANGSD-wrapper by analyzing resequencing data from populations of wild and domesticated Zea. ANGSD-wrapper is freely available from https://github.com/mojaveazure/angsd-wrapper.

Download Full-text

Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach

BMC Bioinformatics ◽

10.1186/s12859-021-04090-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jeffrey N. Dudley ◽

◽

Celine S. Hong ◽

Marwan A. Hawari ◽

Jasmine Shwetar ◽

...

Keyword(s):

Next Generation Sequencing ◽

Somatic Mosaicism ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Low Level ◽

Pathogenic Variants ◽

Segmental Overgrowth ◽

Generation Sequencing

Abstract Background The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging. Results Here, we present a method for Position-Based Variant Identification (PBVI) that uses empirically-derived distributions of alternate nucleotides from a control dataset. We modeled this approach on 11 segmental overgrowth genes. We show that this method improves detection of single nucleotide mosaic variants of 0.01–0.05 variant allele fraction compared to other low-level variant callers. At depths of 600 × and 1200 ×, we observed > 85% and > 95% sensitivity, respectively. In a cohort of 26 individuals with somatic overgrowth disorders PBVI showed improved signal to noise, identifying pathogenic variants in 17 individuals. Conclusion PBVI can facilitate identification of low-level mosaic variants thus increasing the utility of next-generation sequencing data for research and diagnostic purposes.

Download Full-text

ANGSD-wrapper: utilities for analyzing next generation sequencing data

10.7287/peerj.preprints.1472v2 ◽

2016 ◽

Cited By ~ 1

Author(s):

Arun Durvasula ◽

Paul J Hoffman ◽

Tyler V Kent ◽

Chaochih Liu ◽

Thomas J Y Kono ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Molecular Ecology ◽

Principal Component ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Genome Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

User Friendly

High throughput sequencing has changed many aspects of population genetics, molecular ecology, and related fields, affecting both experimental design and data analysis. The software package ANGSD allows users to perform a number of population genetic analyses on high-throughput sequencing data. ANGSD uses probabilistic approaches to calculate genome-wide descriptive statistics. The package makes use of genotype likelihood estimates rather than SNP calls and is specifically designed to produce more accurate results for samples with low sequencing depth. ANGSD makes use of full genome data while handling a wide array of sampling and experimental designs. Here we present ANGSD-wrapper, a set of wrapper scripts that provide a user-friendly interface for running ANGSD and visualizing results. ANGSD-wrapper supports multiple types of analyses including esti- mates of nucleotide sequence diversity and performing neutrality tests, principal component analysis, estimation of admixture proportions for individuals samples, and calculation of statistics that quantify recent introgression. ANGSD-wrapper also provides interactive graphing of ANGSD results to enhance data exploration. We demonstrate the usefulness of ANGSD-wrapper by analyzing resequencing data from populations of wild and domesticated Zea. ANGSD-wrapper is freely available from https://github.com/mojaveazure/angsd-wrapper.

Download Full-text