Comparative machine learning approach for biomarker identification using multiomics data from patients with endometriosis
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Endometriosis is a complex and common gynecological disorder yet a poorly understood disease affecting about 176 million women worldwide, and causing significant impact on their quality of life and economic burden. Neither a definitive clinical symptom nor a minimally invasive diagnostic method is available thus leading to an average of 10 years of diagnostic latency. Discovery of relevant biological patterns from microarray expression or next generation sequence (NGS) data has been advanced over the last several decades by applying various machine learning tools. The overall objective of this project was to identify diagnostic molecular mechanisms and biomarkers of endometriosis using a multi-omics approach and various machine learning classifiers. This objective was fulfilled by three related but independent aims: (1) mining rna-seq data to discover molecular mechanisms of endometriosis, (2) to discover diagnostics features of endometriosis in the DNA-methylation profile of the endometrium, and (3) develop innovative machine learning-based differential classification models using whole genome high throughput next generation sequence data. We experimented how well various supervised machine learning methods such as decision tree, Partial least squares-discriminant analysis, support vector machine, random forest and a newly developed method called GenomeForest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data.