scholarly journals An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants

Genes ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 1102 ◽  
Author(s):  
Satishkumar Ranganathan Ganakammal ◽  
Emil Alexov

Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs.

2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Ianthe A. E. M. van Belzen ◽  
Alexander Schönhuth ◽  
Patrick Kemmeren ◽  
Jayne Y. Hehir-Kwa

AbstractCancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.


2019 ◽  
Vol 105 (4) ◽  
pp. 338-352 ◽  
Author(s):  
Maria Teresa Ricci ◽  
Sara Volorio ◽  
Stefano Signoroni ◽  
Paolo Mariani ◽  
Frederique Mariette ◽  
...  

Introduction: Recent advances in technology and research are rapidly changing the diagnostic approach to hereditary gastrointestinal cancer (HGIC) syndromes. Although the practice of clinical genetics is currently transitioning from targeted criteria-based testing to multigene panels, important challenges remain to be addressed. The aim of this study was to develop and technically validate the performance of a multigene panel for HGIC. Methods: CGT-colon-G14 is an amplicon-based panel designed to detect single nucleotide variants and small insertions/deletions in 14 well-established or presumed high-penetrance genes involved in HGIC. The assay parameters tested were sensitivity, specificity, accuracy, and inter-run and intra-run reproducibility. Performance and clinical impact were determined using 48 samples of patients with suspected HGIC/polyposis previously tested with the targeted approach. Results: The CGT-colon-G14 panel showed 99.99% accuracy and 100% inter- and intra-run reproducibility. Moreover, panel testing detected 1 actionable pathogenic variant and 16 variants with uncertain clinical impact that were missed by the conventional approach because they were located in genes not previously analyzed. Conclusion: Introduction of the CGT-colon-G14 panel into the clinic could provide a higher diagnostic yield than a step-wise approach; however, results may not always be straightforward without the implementation of new genetic counseling models.


Heredity ◽  
2020 ◽  
Vol 124 (5) ◽  
pp. 658-674 ◽  
Author(s):  
Mahmoud Amiri Roudbar ◽  
Mohammad Reza Mohammadabadi ◽  
Ahmad Ayatollahi Mehrgardi ◽  
Rostam Abdollahi-Arpanahi ◽  
Mehdi Momen ◽  
...  

2021 ◽  
Author(s):  
Kishwar Shafin ◽  
Trevor Pesout ◽  
Pi-Chuan Chang ◽  
Maria Nattestad ◽  
Alexey Kolesnikov ◽  
...  

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).


2021 ◽  
Author(s):  
Morteza M Saber ◽  
Jannik Donner ◽  
Inès Levade ◽  
Nicole Acosta ◽  
Michael D Perkins ◽  
...  

Complex polymicrobial communities inhabit the lungs of individuals with cystic fibrosis (CF) and contribute to the decline in lung function. However, the severity of lung disease and its progression in CF patients are highly variable and imperfectly predicted by host clinical factors at baseline, CFTR mutations in the host genome, or sputum polymicrobial community variation. The opportunistic pathogen Pseudomonas aeruginosa (Pa) dominates airway infections in the majority of CF adults. Here we hypothesized that genetic variation within Pa populations would be predictive of lung disease severity. To quantify Pa genetic variation within whole CF sputum samples, we used deep amplicon sequencing on a newly developed custom Ion AmpliSeq panel of 209 Pa genes previously associated with the host pathoadaptation and pathogenesis of CF infection. We trained machine learning models using Pa single nucleotide variants (SNVs), clinical and microbiome diversity data to classify lung disease severity at the time of sputum sampling, and to predict future lung function decline over five years in a cohort of 54 adult CF patients with chronic Pa infection. The models using Pa SNVs alone classified baseline lung disease with good sensitivity and specificity, with an area under the receiver operating characteristic curve (AUROC) of 0.87. While the models were less predictive of future lung function decline, they still achieved an AUROC of 0.74. The addition of clinical data to the models, but not microbiome community data, yielded modest improvements (baseline lung function: AUROC=0.92; lung function decline: AUROC=0.79), highlighting the predictive value of the AmpliSeq data. Together, our work provides a proof-of-principle that Pa genetic variation in sputum is strongly associated with baseline lung disease, moderately predicts future lung function decline, and provides insight into the pathobiology of Pa's effect on CF.


Heart Rhythm ◽  
2012 ◽  
Vol 9 (11) ◽  
pp. 1912 ◽  
Author(s):  
J.D. Kapplinger ◽  
J.R. Giudicessi ◽  
D.J. Tester ◽  
T.E. Callis ◽  
M.J. Ackerman

2020 ◽  
Vol 66 (12) ◽  
pp. 1521-1530
Author(s):  
Kim de Lange ◽  
Eddy N de Boer ◽  
Anneke Bosga ◽  
Mohamed Z Alimohamed ◽  
Lennart F Johansson ◽  
...  

Abstract Background Patients with hematological malignancies (HMs) carry a wide range of chromosomal and molecular abnormalities that impact their prognosis and treatment. Since no current technique can detect all relevant abnormalities, technique(s) are chosen depending on the reason for referral, and abnormalities can be missed. We tested targeted transcriptome sequencing as a single platform to detect all relevant abnormalities and compared it to current techniques. Material and Methods We performed RNA-sequencing of 1385 genes (TruSight RNA Pan-Cancer, Illumina) in bone marrow from 136 patients with a primary diagnosis of HM. We then applied machine learning to expression profile data to perform leukemia classification, a method we named RANKING. Gene fusions for all the genes in the panel were detected, and overexpression of the genes EVI1, CCND1, and BCL2 was quantified. Single nucleotide variants/indels were analyzed in acute myeloid leukemia (AML), myelodysplastic syndrome and patients with acute lymphoblastic leukemia (ALL) using a virtual myeloid (54 genes) or lymphoid panel (72 genes). Results RANKING correctly predicted the leukemia classification of all AML and ALL samples and improved classification in 3 patients. Compared to current methods, only one variant was missed, c.2447A>T in KIT (RT-PCR at 10−4), and BCL2 overexpression was not seen due to a t(14; 18)(q32; q21) in 2% of the cells. Our RNA-sequencing method also identified 6 additional fusion genes and overexpression of CCND1 due to a t(11; 14)(q13; q32) in 2 samples. Conclusions Our combination of targeted RNA-sequencing and data analysis workflow can improve the detection of relevant variants, and expression patterns can assist in establishing HM classification.


2012 ◽  
Vol 5 (5) ◽  
pp. 519-528 ◽  
Author(s):  
John R. Giudicessi ◽  
Jamie D. Kapplinger ◽  
David J. Tester ◽  
Marielle Alders ◽  
Benjamin A. Salisbury ◽  
...  

2019 ◽  
pp. 241-252
Author(s):  
V. Pomohaibo ◽  
O. Berezan ◽  
A. Petrushov

The risk of schizophrenia is caused by mutations in brain expressed genes. Four groups of mutations are distinguished: single-nucleotide polymorphisms, single-nucleotide variants, small insertions/deletions and copy number variations. Each individual disruptive allele has a weak clinical effect, but their certain complex causes schizophrenia hereditary liability. Currently almost 30 alleles with SNPs were identified, but theirs can be several thousands. It was showed that 2546 genes with SNVs and InDel have a higher probability of being associated with schizophrenia. It was identified more than 20 schizophrenia risk loci with CNVs that are distributed over the genome-wide.It was noted that the genetic mechanism of schizophrenia is extremely complex and far from understanding. Satisfactory genetic model of this disease does not exist for the present. It is proposed a classification of schizophrenia risk alleles according to their frequency: common, rare and de novo.


Sign in / Sign up

Export Citation Format

Share Document