An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants

Satishkumar Ranganathan Ganakammal; Emil Alexov

doi:10.3390/genes11091102

An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants

Genes ◽

10.3390/genes11091102 ◽

2020 ◽

Vol 11 (9) ◽

pp. 1102 ◽

Cited By ~ 1

Author(s):

Satishkumar Ranganathan Ganakammal ◽

Emil Alexov

Keyword(s):

Genetic Variation ◽

Rna Processing ◽

High Accuracy ◽

Clinical Impact ◽

Amino Acid Sequences ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Functional Studies ◽

Or Gene

Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs.

Download Full-text

Faculty Opinions recommendation of Phylogenetic and physicochemical analyses enhance the classification of rare nonsynonymous single nucleotide variants in type 1 and 2 long-QT syndrome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717960422.793463950 ◽

2012 ◽

Author(s):

Jeffrey Noebels ◽

Tara Klassen

Keyword(s):

Long Qt Syndrome ◽

Single Nucleotide Variants ◽

Long Qt ◽

Single Nucleotide ◽

Qt Syndrome

Download Full-text

Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology

npj Precision Oncology ◽

10.1038/s41698-021-00155-6 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Ianthe A. E. M. van Belzen ◽

Alexander Schönhuth ◽

Patrick Kemmeren ◽

Jayne Y. Hehir-Kwa

Keyword(s):

Intratumor Heterogeneity ◽

Precision Oncology ◽

Single Nucleotide Variants ◽

Full Spectrum ◽

Single Nucleotide ◽

Sequencing Technologies ◽

Cancer Genomes ◽

Genomic Aberrations

AbstractCancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.

Download Full-text

Development, technical validation, and clinical application of a multigene panel for hereditary gastrointestinal cancer and polyposis

Tumori Journal ◽

10.1177/0300891619847085 ◽

2019 ◽

Vol 105 (4) ◽

pp. 338-352 ◽

Cited By ~ 2

Author(s):

Maria Teresa Ricci ◽

Sara Volorio ◽

Stefano Signoroni ◽

Paolo Mariani ◽

Frederique Mariette ◽

...

Keyword(s):

Gastrointestinal Cancer ◽

Diagnostic Yield ◽

Clinical Impact ◽

Clinical Genetics ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Panel Testing ◽

Technical Validation ◽

Sensitivity Specificity ◽

Multigene Panel

Introduction: Recent advances in technology and research are rapidly changing the diagnostic approach to hereditary gastrointestinal cancer (HGIC) syndromes. Although the practice of clinical genetics is currently transitioning from targeted criteria-based testing to multigene panels, important challenges remain to be addressed. The aim of this study was to develop and technically validate the performance of a multigene panel for HGIC. Methods: CGT-colon-G14 is an amplicon-based panel designed to detect single nucleotide variants and small insertions/deletions in 14 well-established or presumed high-penetrance genes involved in HGIC. The assay parameters tested were sensitivity, specificity, accuracy, and inter-run and intra-run reproducibility. Performance and clinical impact were determined using 48 samples of patients with suspected HGIC/polyposis previously tested with the targeted approach. Results: The CGT-colon-G14 panel showed 99.99% accuracy and 100% inter- and intra-run reproducibility. Moreover, panel testing detected 1 actionable pathogenic variant and 16 variants with uncertain clinical impact that were missed by the conventional approach because they were located in genes not previously analyzed. Conclusion: Introduction of the CGT-colon-G14 panel into the clinic could provide a higher diagnostic yield than a step-wise approach; however, results may not always be straightforward without the implementation of new genetic counseling models.

Download Full-text

Integration of single nucleotide variants and whole-genome DNA methylation profiles for classification of rheumatoid arthritis cases from controls

Heredity ◽

10.1038/s41437-020-0301-4 ◽

2020 ◽

Vol 124 (5) ◽

pp. 658-674 ◽

Cited By ~ 1

Author(s):

Mahmoud Amiri Roudbar ◽

Mohammad Reza Mohammadabadi ◽

Ahmad Ayatollahi Mehrgardi ◽

Rostam Abdollahi-Arpanahi ◽

Mehdi Momen ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Dna Methylation ◽

Whole Genome ◽

Single Nucleotide Variants ◽

Single Nucleotide

Download Full-text

Haplotype-aware variant calling enables high accuracy in nanopore long-reads using deep neural networks

10.1101/2021.03.04.433952 ◽

2021 ◽

Author(s):

Kishwar Shafin ◽

Trevor Pesout ◽

Pi-Chuan Chang ◽

Maria Nattestad ◽

Alexey Kolesnikov ◽

...

Keyword(s):

De Novo ◽

Sequence Data ◽

Variant Calling ◽

High Accuracy ◽

Superior Performance ◽

Read Length ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Short Read ◽

Long Read

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).

Download Full-text

Single nucleotide variants in Pseudomonas aeruginosa populations from sputum correlate with baseline lung function and predict disease progression in individuals with cystic fibrosis

10.1101/2021.10.04.21264421 ◽

2021 ◽

Author(s):

Morteza M Saber ◽

Jannik Donner ◽

Inès Levade ◽

Nicole Acosta ◽

Michael D Perkins ◽

...

Keyword(s):

Cystic Fibrosis ◽

Pseudomonas Aeruginosa ◽

Genetic Variation ◽

Lung Function ◽

Lung Disease ◽

Disease Severity ◽

Single Nucleotide Variants ◽

Lung Function Decline ◽

Single Nucleotide ◽

Lung Disease Severity

Complex polymicrobial communities inhabit the lungs of individuals with cystic fibrosis (CF) and contribute to the decline in lung function. However, the severity of lung disease and its progression in CF patients are highly variable and imperfectly predicted by host clinical factors at baseline, CFTR mutations in the host genome, or sputum polymicrobial community variation. The opportunistic pathogen Pseudomonas aeruginosa (Pa) dominates airway infections in the majority of CF adults. Here we hypothesized that genetic variation within Pa populations would be predictive of lung disease severity. To quantify Pa genetic variation within whole CF sputum samples, we used deep amplicon sequencing on a newly developed custom Ion AmpliSeq panel of 209 Pa genes previously associated with the host pathoadaptation and pathogenesis of CF infection. We trained machine learning models using Pa single nucleotide variants (SNVs), clinical and microbiome diversity data to classify lung disease severity at the time of sputum sampling, and to predict future lung function decline over five years in a cohort of 54 adult CF patients with chronic Pa infection. The models using Pa SNVs alone classified baseline lung disease with good sensitivity and specificity, with an area under the receiver operating characteristic curve (AUROC) of 0.87. While the models were less predictive of future lung function decline, they still achieved an AUROC of 0.74. The addition of clinical data to the models, but not microbiome community data, yielded modest improvements (baseline lung function: AUROC=0.92; lung function decline: AUROC=0.79), highlighting the predictive value of the AmpliSeq data. Together, our work provides a proof-of-principle that Pa genetic variation in sputum is strongly associated with baseline lung disease, moderately predicts future lung function decline, and provides insight into the pathobiology of Pa's effect on CF.

Download Full-text

Enhanced Classification of Non-synonymous Single Nucleotide Variants in the SCN5A-Encoded Nav1.5 Cardiac Sodium Channel

Heart Rhythm ◽

10.1016/j.hrthm.2012.09.098 ◽

2012 ◽

Vol 9 (11) ◽

pp. 1912 ◽

Cited By ~ 4

Author(s):

J.D. Kapplinger ◽

J.R. Giudicessi ◽

D.J. Tester ◽

T.E. Callis ◽

M.J. Ackerman

Keyword(s):

Sodium Channel ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Cardiac Sodium Channel

Download Full-text

Targeted RNA-Sequencing Enables Detection of Relevant Translocations and Single Nucleotide Variants and Provides a Method for Classification of Hematological Malignancies–RANKING

Clinical Chemistry ◽

10.1093/clinchem/hvaa221 ◽

2020 ◽

Vol 66 (12) ◽

pp. 1521-1530

Author(s):

Kim de Lange ◽

Eddy N de Boer ◽

Anneke Bosga ◽

Mohamed Z Alimohamed ◽

Lennart F Johansson ◽

...

Keyword(s):

Rna Sequencing ◽

Hematological Malignancies ◽

Lymphoblastic Leukemia ◽

Expression Patterns ◽

Primary Diagnosis ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Molecular Abnormalities ◽

Wide Range

Abstract Background Patients with hematological malignancies (HMs) carry a wide range of chromosomal and molecular abnormalities that impact their prognosis and treatment. Since no current technique can detect all relevant abnormalities, technique(s) are chosen depending on the reason for referral, and abnormalities can be missed. We tested targeted transcriptome sequencing as a single platform to detect all relevant abnormalities and compared it to current techniques. Material and Methods We performed RNA-sequencing of 1385 genes (TruSight RNA Pan-Cancer, Illumina) in bone marrow from 136 patients with a primary diagnosis of HM. We then applied machine learning to expression profile data to perform leukemia classification, a method we named RANKING. Gene fusions for all the genes in the panel were detected, and overexpression of the genes EVI1, CCND1, and BCL2 was quantified. Single nucleotide variants/indels were analyzed in acute myeloid leukemia (AML), myelodysplastic syndrome and patients with acute lymphoblastic leukemia (ALL) using a virtual myeloid (54 genes) or lymphoid panel (72 genes). Results RANKING correctly predicted the leukemia classification of all AML and ALL samples and improved classification in 3 patients. Compared to current methods, only one variant was missed, c.2447A>T in KIT (RT-PCR at 10−4), and BCL2 overexpression was not seen due to a t(14; 18)(q32; q21) in 2% of the cells. Our RNA-sequencing method also identified 6 additional fusion genes and overexpression of CCND1 due to a t(11; 14)(q13; q32) in 2 samples. Conclusions Our combination of targeted RNA-sequencing and data analysis workflow can improve the detection of relevant variants, and expression patterns can assist in establishing HM classification.

Download Full-text

Phylogenetic and Physicochemical Analyses Enhance the Classification of Rare Nonsynonymous Single Nucleotide Variants in Type 1 and 2 Long-QT Syndrome

Circulation Cardiovascular Genetics ◽

10.1161/circgenetics.112.963785 ◽

2012 ◽

Vol 5 (5) ◽

pp. 519-528 ◽

Cited By ~ 49

Author(s):

John R. Giudicessi ◽

Jamie D. Kapplinger ◽

David J. Tester ◽

Marielle Alders ◽

Benjamin A. Salisbury ◽

...

Keyword(s):

Long Qt Syndrome ◽

Single Nucleotide Variants ◽

Long Qt ◽

Single Nucleotide ◽

Qt Syndrome

Download Full-text

SCHIZOPHRENIA: THE SEARCH FOR GENETIC RISK FACTORS

Psychology and Personality ◽

10.33989/2226-4078.2019.1.164024 ◽

2019 ◽

pp. 241-252

Author(s):

V. Pomohaibo ◽

O. Berezan ◽

A. Petrushov

Keyword(s):

Genetic Risk ◽

Clinical Effect ◽

De Novo ◽

Genetic Model ◽

Copy Number Variations ◽

Nucleotide Polymorphisms ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Risk Alleles

The risk of schizophrenia is caused by mutations in brain expressed genes. Four groups of mutations are distinguished: single-nucleotide polymorphisms, single-nucleotide variants, small insertions/deletions and copy number variations. Each individual disruptive allele has a weak clinical effect, but their certain complex causes schizophrenia hereditary liability. Currently almost 30 alleles with SNPs were identified, but theirs can be several thousands. It was showed that 2546 genes with SNVs and InDel have a higher probability of being associated with schizophrenia. It was identified more than 20 schizophrenia risk loci with CNVs that are distributed over the genome-wide.It was noted that the genetic mechanism of schizophrenia is extremely complex and far from understanding. Satisfactory genetic model of this disease does not exist for the present. It is proposed a classification of schizophrenia risk alleles according to their frequency: common, rare and de novo.

Download Full-text