scholarly journals SF3B1ness score: screeningSF3B1mutation status from over 60,000 transcriptomes based on a machine learning approach

2019 ◽  
Author(s):  
Yuichi Shiraishi ◽  
Kenichi Chiba ◽  
Ai Okada

AbstractIn precision oncology, genomic evidence is used to determine the optimal treatment for each patient. However, identification of somatic mutations from genome sequencing data is often technically difficult and functional significance of somatic mutations is inconclusive in many cases. In this paper, to seek for an alternative approach, we tackle the problem of predicting functional mutations from transcriptome sequencing data. Focusing onSF3B1, a key splicing factor gene, we develop SF3B1ness score for classifying functional mutation status using a combination of Naive Bayes classifier and zero-inflated beta-binomial modeling (R package is available at (https://github.com/friend1WS/SF3B1ness). Using 8,992 TCGA exome and RNA sequencing data for evaluation, we show that the classifier based on SF3B1ness score is able to (1) attain very high precision (>93%) and sensitivity (>95%), (2) rescue several somatic mutations not identified by exome sequence analysis especially due to low variant allele frequencies, and (3) successfully measure functional importance for somatic mutation whose significance has been unknown. Furthermore, to demonstrate that the SF3B1ness score is highly robust and can be extensible to the cohorts outside training data, we performed a functionalSF3B1mutation screening on 51,577 additional transcriptome sequencing data. We have detected 135 samples with putativeSF3B1functional mutations including those that are rarely registered in the somatic mutation database (e.g., G664C, L747W, and R775G). Moreover, we could identify two cases withSF3B1mutations from normal tissues, implying that SF3B1ness score can be used for detecting clonal hematopoiesis.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Haifeng Wang ◽  
Chengche Wang ◽  
Hongchun Qu

The detection and characterization of somatic mutations have become the important means to analyze the occurrence and development of cancer and, ultimately, will help to select effective and precise treatment for specific cancer patients. It is very difficult to detect somatic mutations accurately from the massive sequencing data. In this paper, a forest-graph-embedded deep feed-forward network (forgeNet) is utilized to detect somatic mutations from the sequencing data. In forgeNet, the random forest (RF) or Gradient Boosting Machine (GBM) and graph-embedded deep feed-forward network (GEDFN) are utilized to extract features and implement classification, respectively. Three real somatic mutation datasets collected from 48 triple-negative breast cancers are utilized to test the somatic mutation detection performances of forgeNet. The detection results show that forgeNet could make the 0.05%–0.424% improvements in terms of area under the curve (AUC) compared with support vector machines and random forest.


2020 ◽  
Vol 96 (8) ◽  
Author(s):  
D R Finn ◽  
J Yu ◽  
Z E Ilhan ◽  
V M C Fernandes ◽  
C R Penton ◽  
...  

ABSTRACT Niche is a fundamental concept in ecology. It integrates the sum of biotic and abiotic environmental requirements that determines a taxon's distribution. Microbiologists currently lack quantitative approaches to address niche-related hypotheses. We tested four approaches for the quantification of niche breadth and overlap of taxa in amplicon sequencing datasets, with the goal of determining generalists, specialists and environmental-dependent distributions of community members. We applied these indices to in silico training datasets first, and then to real human gut and desert biological soil crust (biocrust) case studies, assessing the agreement of the indices with previous findings. Implementation of each approach successfully identified a priori conditions within in silico training data, and we found that by including a limit of quantification based on species rank, one could identify taxa falsely classified as specialists because of their low, sparse counts. Analysis of the human gut study offered quantitative support for Bacilli, Gammaproteobacteria and Fusobacteria specialists enriched after bariatric surgery. We could quantitatively characterise differential niche distributions of cyanobacterial taxa with respect to precipitation gradients in biocrusts. We conclude that these approaches, made publicly available as an R package (MicroNiche), represent useful tools to assess microbial environment-taxon and taxon-taxon relationships in a quantitative manner.


2018 ◽  
Author(s):  
Keren Yizhak ◽  
Francois Aguet ◽  
Jaegil Kim ◽  
Julian Hess ◽  
Kirsten Kubler ◽  
...  

AbstractCancer genome studies have significantly advanced our knowledge of somatic mutations. However, how these mutations accumulate in normal cells and whether they promote pre-cancerous lesions remains poorly understood. Here we perform a comprehensive analysis of normal tissues by utilizing RNA sequencing data from ∼6,700 samples across 29 normal tissues collected as part of the Genotype-Tissue Expression (GTEx) project. We identify somatic mutations using a newly developed pipeline, RNA-MuTect, for calling somatic mutations directly from RNA-seq samples and their matched-normal DNA. When applied to the GTEx dataset, we detect multiple variants across different tissues and find that mutation burden is associated with both the age of the individual and tissue proliferation rate. We also detect hotspot cancer mutations that share tissue specificity with their matched cancer type. This study is the first to analyze a large number of samples across multiple normal tissues, identifying clones with genomic aberrations observed in cancer.


Author(s):  
Firda Aminy Maruf ◽  
Rian Pratama ◽  
Giltae Song

Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.


2020 ◽  
pp. jmedgenet-2020-106905
Author(s):  
Ji-Hye Oh ◽  
Chang Ohk Sung

BackgroundSomatic mutations are a major driver of cancer development and many have now been identified in various cancer types, but the comprehensive somatic mutation status of the normal tissues matched to tumours has not been revealed.MethodWe analysed the somatic mutations of whole exome sequencing data in 392 patient tumour and normal tissue pairs based on the corresponding blood samples across 10 tumour types.ResultsMany of the mutations involved in oncogenic pathways such as PI3K, NOTCH and TP53, were identified in the normal tissues. The ageing-related mutational signature was the most prominent contributing signature found and the mutations in the normal tissues were frequently in genes involved in late replication time (p<0.0001). Variants were rarely overlapping across tissue types but shared variants between normal and matched tumour tissue were present. These shared variants were frequently pathogenic when compared with non-shared variants (p=0.001) and showed a higher variant-allele-fraction (p<0.0001). Normal tissue-specific mutated genes were frequently non-cancer-associated (p=0.009). PIK3CA mutations were identified in 6 normal tissues and were harboured by all of the matched cancer tissues. Multiple types of PIK3CA mutations were found in normal breast and matched cancer tissues. The PIK3CA mutations exclusively present in normal tissue may indicate clonal expansions unrelated to the tumour. In addition, PIK3CA mutation was appeared that they arose before the occurrence of the allelic imbalance.ConclusionOur current results suggest that somatic mutant clones exist in normal tissues and that their clonal expansion could be linked to cancer development.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 840-840
Author(s):  
Siddhartha Jaiswal ◽  
Pierre Fontanillas ◽  
Jason Flannick ◽  
Alisa Manning ◽  
Peter Grauman ◽  
...  

Abstract Hematological malignancies are associated with recurrent somatic mutations in specific genes, and may be preceded by a pre-malignant state in which only the initial driver mutation(s) are present. For example, monoclonal gammopathy of unknown significance often precedes multiple myeloma and monoclonal B-lymphocytosis can precede chronic lymphocytic leukemia. Recent sequencing studies have identified genes that are recurrently mutated in acute myeloid leukemia, myelodysplastic syndromes, myeloproliferative neoplasms, acute lymphoblastic leukemia, and other hematological neoplasms. We hypothesized that a pre-malignant state comprised of a clonal expansion of cells harboring some of these recurrent mutations would be detectable in the blood of elderly individuals not known to have hematological disorders. To address this question, we analyzed whole exome sequencing data from peripheral blood cell DNA of 17,182 individuals. Most of these were sequenced for type 2 diabetes genetic association studies and were therefore unselected for hematological phenotypes. We looked for candidate somatic variants by identifying previously characterized single nucleotide variants (SNVs) and small insertions/deletions (indels) in 160 genes recurrently mutated in hematological malignancies. The presence of these variants was analyzed for association to hematological phenotypes, survival, and cardiovascular events. We identified a total of 805 candidate somatic variants (hereafter referred to as mutations) from 746 individuals in 73 genes. Somatic mutations were rarely detected in individuals younger than 40, but rose appreciably with age (Figure 1). At ages 70-79, 80-89, and 90-108 these clonal mutations were observed in 9.6% (220 out of 2299), 11.7% (37 out of 317), and 18.4% (19 out of 103) of individuals, respectively. The majority of the variants occurred in 3 genes: DNMT3A (403 variants), TET2 (72 variants), and ASXL1 (62 variants). The median variant allele fraction for the detected somatic mutations was 0.09, from which we infer that the pathologic clone represents on average 18% of circulating white blood cells. Clinical outcome data was available on a subset of subjects, with a median follow-up period of 8 years. Carrying a somatic mutation was associated with increased risk of developing a hematological malignancy (hazard ratio [HR] 11, 95% confidence interval [95% CI] 3.9-33 by competing risks regression). Harboring a mutation was also associated with an increase in all-cause mortality that could not be explained by death due to hematological malignancies alone (HR 1.4, 95% CI 1.1-1.8 by Cox proportional hazards model). We further found that these mutations were associated with type 2 diabetes (odds ratio 1.3, 95% CI 1.1-1.5) and increased risk of incident coronary heart disease (HR 2.0, 95% CI 1.2-3.4) and ischemic stroke (HR 2.6, 95% CI 1.4-4.8) in multivariable regression models. We conclude that clonal hematopoiesis associated with a somatic mutation in a known cancer-causing gene is a common pre-malignant condition in the elderly. This entity is associated with increased risk of transformation to hematological malignancy, as well as increased all-cause mortality, possibly due to increased cardio-metabolic disease. While the link between somatic mutations and cancer is well established, the relationship between clonal hematopoiesis and cardio-metabolic disease requires further study. Figure 1 Figure Prevalence of somatic mutation by age. Colored bands represent 50, 75, and 95 percent confidence intervals. Figure. Prevalence of somatic mutation by age. Colored bands represent 50, 75, and 95 percent confidence intervals. Disclosures Getz: The Broad Institute, Inc.: PCT/US2013/057128​ (Detecting Variants in Sequencing Data and Benchmarking Methods and Apparatus for Analyzing and Quantifying DNA Alterations in Cancer)​ Patent pending Patents & Royalties; Appistry: Certain NGS analysis tools of Broad Institute are made available for commercial use via Appistry, Certain NGS analysis tools of Broad Institute are made available for commercial use via Appistry Other. Ebert:Genoptix: Consultancy, Patents & Royalties; Celgene: Consultancy, Research Funding.


2020 ◽  
Author(s):  
Sara Akhavanfard ◽  
Lamis Yehia ◽  
Roshan Padmanabhan ◽  
Jordan P Reynolds ◽  
Ying Ni ◽  
...  

Abstract Adrenocortical Carcinoma (ACC) is a rare endocrine tumor with poor overall prognosis and 1.5-fold overrepresentation in females. In children, ACC is associated with inherited cancer syndromes with 50–80% of childhood-ACC associated with TP53 germline variants. ACC in adolescents and young adults (AYA) is rarely due to germline TP53, IGF2, PRKAR1A and MEN1 variants. We analyzed exome sequencing data from 21 children (&lt;15y), 32 AYA (15-39y), and 60 adults (&gt;39y) with ACC, and retained all pathogenic, likely pathogenic, and highly prioritized variants of uncertain significance. We engineered a stable lentiviral-mutant ACC cell line, harboring an EGFR variant (p.Asp1080Asn) from a 21-year-old female without germline-TP53-variant and with aggressive ACC. We found that 4.8% of the children (P = 0.004) and 6.2% of AYA (P &lt; 0.0001), all-female participants, harbored germline EGFR variants, compared to only 0.3% of the control group. Expanding our analysis to the RTK-RAS-MAPK pathway, we found that the RTK genes have the highest number of highly prioritized germline variants in these individuals amongst all three arms of this pathway. We showed EGFR mutant cells migrate faster and are characterized by a stem-like phenotype compared to wild type cells. While EGFR inhibitors did not affect the stemness of mutant cells, Sunitinib, a multireceptor tyrosine kinase inhibitor, significantly reduced their stem-like behavior. Our data suggest that EGFR could be a novel underlying germline predisposition factor for ACC, especially in the Childhood-AYA (C-AYA) population. Further clinical validation can improve precision oncology management of this disease, which is known to have limited therapeutic options.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yuting Wang ◽  
Qin Zou ◽  
Fajin Li ◽  
Wenwei Zhao ◽  
Hui Xu ◽  
...  

AbstractA major part of the transcriptome complexity is attributed to multiple types of DNA or RNA fusion events, which take place within a gene such as alternative splicing or between different genes such as DNA rearrangement and trans-splicing. In the present study, using the RNA deep sequencing data, we systematically survey a type of non-canonical fusions between the RNA transcripts from the two opposite DNA strands. We name the products of such fusion events cross-strand chimeric RNA (cscRNA). Hundreds to thousands of cscRNAs can be found in human normal tissues, primary cells, and cancerous cells, and in other species as well. Although cscRNAs exhibit strong tissue-specificity, our analysis identifies thousands of recurrent cscRNAs found in multiple different samples. cscRNAs are mostly originated from convergent transcriptions of the annotated genes and their anti-sense DNA. The machinery of cscRNA biogenesis is unclear, but the cross-strand junction events show some features related to RNA splicing. The present study is a comprehensive survey of the non-canonical cross-strand RNA junction events, a resource for further characterization of the originations and functions of the cscRNAs.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Yixin Kong ◽  
Ariangela Kozik ◽  
Cindy H. Nakatsu ◽  
Yava L. Jones-Hall ◽  
Hyonho Chun

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.


2021 ◽  
Vol 49 (6) ◽  
pp. 030006052110210
Author(s):  
Hui Sun ◽  
Li Ma ◽  
Jie Chen

Objective Uterine carcinosarcoma (UCS) is a rare, aggressive tumour with a high metastasis rate and poor prognosis. This study aimed to explore potential key genes associated with the prognosis of UCS. Methods Transcriptional expression data were downloaded from the Gene Expression Profiling Interactive Analysis database and differentially expressed genes (DEGs) were subjected to Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses using Metascape. A protein–protein interaction network was constructed using the STRING website and Cytoscape software, and the top 30 genes obtained through the Maximal Clique Centrality algorithm were selected as hub genes. These hub genes were validated by clinicopathological and sequencing data for 56 patients with UCS from The Cancer Genome Atlas database. Results A total of 1894 DEGs were identified, and the top 30 genes were considered as hub genes. Hyaluronan-mediated motility receptor (HMMR) expression was significantly higher in UCS tissues compared with normal tissues, and elevated expression of HMMR was identified as an independent prognostic factor for shorter survival in patients with UCS. Conclusions These results suggest that HMMR may be a potential biomarker for predicting the prognosis of patients with UCS.


Sign in / Sign up

Export Citation Format

Share Document