CNAPE: A Machine Learning Method for Copy Number Alteration Prediction from Gene Expression

10.1101/704486 ◽

2019 ◽

Author(s):

Quanhua Mu ◽

Jiguang Wang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Copy Number ◽

Copy Number Alteration ◽

The Cancer Genome Atlas ◽

Detection Methods ◽

Dna Arrays ◽

Machine Learning Model ◽

Cancer Genome Atlas ◽

Genomic Regions

AbstractCopy number alteration (CNA), the abnormal number of copies of genomic regions, plays a key role in cancer initiation and progression. Current high-throughput CNA detection methods, including DNA arrays and genomic sequencing, are relatively expensive and require DNA samples at a microgram level, which are not achievable in certain occasions such as clinical biopsies or single-cell genomes. Here we proposed an alternative method—CNAPE to computationally infer CNA using gene expression data. A prior knowledge-aided machine learning model was proposed, trained and tested on the transcriptomic profiles with matched CNA data of 9,740 cancers from The Cancer Genome Atlas. Using brain tumors as a proof-of-concept study, CNAPE achieved over 90% accuracy in the prediction of arm-level CNAs. Prediction performance for 12 gene-level CNAs (commonly altered genes in glioma) was also evaluated, and CNAPE achieved reasonable accuracy. CNAPE is developed as an easy-to-use tool at http://wang-lab.ust.hk/software/Software.html.

Download Full-text

Risk stratification of cervical lesions using capture sequencing and machine learning method based on HPV and human integrated genomic profiles

Carcinogenesis ◽

10.1093/carcin/bgz094 ◽

2019 ◽

Vol 40 (10) ◽

pp. 1220-1228 ◽

Cited By ~ 3

Author(s):

Rui Tian ◽

Zifeng Cui ◽

Dan He ◽

Xun Tian ◽

Qinglei Gao ◽

...

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Risk Stratification ◽

Copy Number ◽

Hpv Infection ◽

Genomic Variation ◽

Machine Learning Method ◽

Learning Method ◽

Cervical Lesions ◽

Capture Sequencing

Abstract From initial human papillomavirus (HPV) infection and precursor stages, the development of cervical cancer takes decades. High-sensitivity HPV DNA testing is currently recommended as primary screening method for cervical cancer, whereas better triage methodologies are encouraged to provide accurate risk management for HPV-positive women. Given that virus-driven genomic variation accumulates during cervical carcinogenesis, we designed a 39 Mb custom capture panel targeting 17 HPV types and 522 mutant genes related to cervical cancer. Using capture-based next-generation sequencing, HPV integration status, somatic mutation and copy number variation were analyzed on 34 paired samples, including 10 cases of HPV infection (HPV+), 10 cases of cervical intraepithelial neoplasia (CIN) grade and 14 cases of CIN2+ (CIN2: n = 1; CIN2-3: n = 3; CIN3: n = 9; squamous cell carcinoma: n = 1). Finally, the machine learning algorithm (Random Forest) was applied to build the risk stratification model for cervical precursor lesions based on CIN2+ enriched biomarkers. Generally, HPV integration events (11 in HPV+, 25 in CIN1 and 56 in CIN2+), non-synonymous mutations (2 in CIN1, 12 in CIN2+) and copy number variations (19.1 in HPV+, 29.4 in CIN1 and 127 in CIN2+) increased from HPV+ to CIN2+. Interestingly, ‘common’ deletion of mitochondrial chromosome was significantly observed in CIN2+ (P = 0.009). Together, CIN2+ enriched biomarkers, classified as HPV information, mutation, amplification, deletion and mitochondrial change, successfully predicted CIN2+ with average accuracy probability score of 0.814, and amplification and deletion ranked as the most important features. Our custom capture sequencing combined with machine learning method effectively stratified the risk of cervical lesions and provided valuable integrated triage strategies.

Download Full-text

New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data

2006 International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2006.4398195 ◽

2006 ◽

Author(s):

Ching Wei Wang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Machine Learning Method ◽

Learning Method ◽

Expression Data ◽

Ensemble Machine Learning

Download Full-text

Predicting candidate genes from phenotypes, functions and anatomical site of expression

Bioinformatics ◽

10.1093/bioinformatics/btaa879 ◽

2020 ◽

Author(s):

Jun Chen ◽

Azza Althagafi ◽

Robert Hoehndorf

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Prioritization ◽

Supplementary Information ◽

Model Organisms ◽

Anatomical Site ◽

Machine Learning Method ◽

Gene Products ◽

Learning Method ◽

Biomedical Ontologies

Abstract Motivation Over the past years, many computational methods have been developed to incorporate information about phenotypes for disease–gene prioritization task. These methods generally compute the similarity between a patient’s phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms, such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine-learning models. Results We developed a novel graph-based machine-learning method for biomedical ontologies, which is able to exploit axioms in ontologies and other graph-structured data. Using our machine-learning method, we embed genes based on their associated phenotypes, functions of the gene products and anatomical location of gene expression. We then develop a machine-learning model to predict gene–disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state-of-the-art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes, which are associated with phenotypes, functions or site of expression. Availability and implementation Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Explainable t-SNE for single-cell RNA-seq data analysis

10.1101/2022.01.12.476084 ◽

2022 ◽

Author(s):

Henry Han ◽

Tianyu Zhang ◽

Mary Lauren Benton ◽

Chun Li ◽

Juan Wang ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Data Analysis ◽

Dimension Reduction ◽

Single Cell ◽

Method Development ◽

Robustness Analysis ◽

High Dimensional ◽

Machine Learning Method ◽

Learning Method

Single-cell RNA (scRNA-seq) sequencing technologies trigger the study of individual cell gene expression and reveal the diversity within cell populations. To measure cell-to-cell similarity based on their transcription and gene expression, many dimension reduction methods are employed to retrieve the corresponding low-dimensional embeddings of input scRNA-seq data to conduct clustering. However, the methods lack explainability and may not perform well with scRNA-seq data because they are often migrated from other fields and not customized for high-dimensional sparse scRNA-seq data. In this study, we propose an explainable t-SNE: cell-driven t-SNE (c-TSNE) that fuses the cell differences reflected from biologically meaningful distance metrics for input scRNA-seq data. Our study shows that the proposed method not only enhances the interpretation of the original t-SNE visualization for scRNA-seq data but also demonstrates favorable single cell segregation performance on benchmark datasets compared to the state-of-the-art peers. The robustness analysis shows that the proposed cell-driven t-SNE demonstrates robustness to dropout and noise in dimension reduction and clustering. It provides a novel and practical way to investigate the interpretability of t-SNE in scRNA-seq data analysis. Unlike the general assumption that the explainanbility of a machine learning method needs to compromise with the learning efficiency, the proposed explainable t-SNE improves both clustering efficiency and explainanbility in scRNA-seq analysis. More importantly, our work suggests that widely used t-SNE can be easily misused in the existing scRNA-seq analysis, because its default Euclidean distance can bring biases or meaningless results in cell difference evaluation for high-dimensional sparse scRNA-seq data. To the best of our knowledge, it is the first explainable t-SNE proposed in scRNA-seq analysis and will inspire other explainable machine learning method development in the field.

Download Full-text

Predicting candidate genes from phenotypes, functions, and anatomical site of expression

10.1101/2020.03.30.015594 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jun Chen ◽

Azza Althagafi ◽

Robert Hoehndorf

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Research Group ◽

Gene Prioritization ◽

Model Organisms ◽

Anatomical Site ◽

Machine Learning Method ◽

Gene Products ◽

Learning Method ◽

Biomedical Ontologies

ABSTRACTMotivationOver the past years, many computational methods have been developed to incorporate information about phenotypes for disease gene prioritization task. These methods generally compute the similarity between a patient’s phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine learning models.ResultsWe developed a novel graph-based machine learning method for biomedical ontologies which is able to exploit axioms in ontologies and other graph-structured data. Using our machine learning method, we embed genes based on their associated phenotypes, functions of the gene products, and anatomical location of gene expression. We then develop a machine learning model to predict gene–disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state of the art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes which are associated with phenotypes, functions, or site of expression.AvailabilitySoftware and data are available at https://github.com/bio-ontology-research-group/[email protected]

Download Full-text

New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data

2006 International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2006.259893 ◽

2006 ◽

Cited By ~ 17

Author(s):

Ching Wei Wang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Machine Learning Method ◽

Learning Method ◽

Expression Data ◽

Ensemble Machine Learning

Download Full-text

Speech Organ Contour Extraction Using Real-Time MRI and Machine Learning Method

10.21437/interspeech.2019-1593 ◽

2019 ◽

Author(s):

Hironori Takemoto ◽

Tsubasa Goto ◽

Yuya Hagihara ◽

Sayaka Hamanaka ◽

Tatsuya Kitamura ◽

...

Keyword(s):

Machine Learning ◽

Real Time ◽

Machine Learning Method ◽

Learning Method ◽

Contour Extraction

Download Full-text

A Novel Ensemble Machine Learning Method to Detect Phishing Attack

2020 IEEE 23rd International Multitopic Conference (INMIC) ◽

10.1109/inmic50486.2020.9318210 ◽

2020 ◽

Author(s):

Abdul Basit ◽

Maham Zafar ◽

Abdul Rehman Javed ◽

Zunera Jalil

Keyword(s):

Machine Learning ◽

Machine Learning Method ◽

Learning Method ◽

Ensemble Machine Learning

Download Full-text

A machine learning method for the evaluation of hydrodynamic performance of floating breakwaters in waves

Ships and Offshore Structures ◽

10.1080/17445302.2021.1927358 ◽

2021 ◽

pp. 1-15

Author(s):

Hassan Saghi ◽

Tommi Mikkola ◽

Spyros Hirdaris

Keyword(s):

Machine Learning ◽

Hydrodynamic Performance ◽

Machine Learning Method ◽

Learning Method ◽

Floating Breakwaters

Download Full-text