Analysis of molecular profile data using generative and discriminative methods

E. J. Moler; M. L. Chow; I. S. Mian

doi:10.1152/physiolgenomics.2000.4.2.109

Analysis of molecular profile data using generative and discriminative methods

Physiological Genomics ◽

10.1152/physiolgenomics.2000.4.2.109 ◽

2000 ◽

Vol 4 (2) ◽

pp. 109-126 ◽

Cited By ~ 47

Author(s):

E. J. Moler ◽

M. L. Chow ◽

I. S. Mian

Keyword(s):

Graphical Models ◽

Biological Networks ◽

Domain Knowledge ◽

Colon Adenocarcinoma ◽

Comparative Sequence Analysis ◽

Diagnostic Tools ◽

Molecular Profile ◽

Support Vector ◽

Marker Genes ◽

Profile Data

A modular framework is proposed for modeling and understanding the relationships between molecular profile data and other domain knowledge using a combination of generative (here, graphical models) and discriminative [Support Vector Machines (SVMs)] methods. As illustration, naive Bayes models, simple graphical models, and SVMs were applied to published transcription profile data for 1,988 genes in 62 colon adenocarcinoma tissue specimens labeled as tumor or nontumor. These unsupervised and supervised learning methods identified three classes or subtypes of specimens, assigned tumor or nontumor labels to new specimens and detected six potentially mislabeled specimens. The probability parameters of the three classes were utilized to develop a novel gene relevance, ranking, and selection method. SVMs trained to discriminate nontumor from tumor specimens using only the 50–200 top-ranked genes had the same or better generalization performance than the full repertoire of 1,988 genes. Approximately 90 marker genes were pinpointed for use in understanding the basic biology of colon adenocarcinoma, defining targets for therapeutic intervention and developing diagnostic tools. These potential markers highlight the importance of tissue biology in the etiology of cancer. Comparative analysis of molecular profile data is proposed as a mechanism for predicting the physiological function of genes in instances when comparative sequence analysis proves uninformative, such as with human and yeast translationally controlled tumour protein. Graphical models and SVMs hold promise as the foundations for developing decision support systems for diagnosis, prognosis, and monitoring as well as inferring biological networks.

Download Full-text

Identifying marker genes in transcription profiling data using a mixture of feature relevance experts

Physiological Genomics ◽

10.1152/physiolgenomics.2001.5.2.99 ◽

2001 ◽

Vol 5 (2) ◽

pp. 99-111 ◽

Cited By ~ 65

Author(s):

M. L. Chow ◽

E. J. Moler ◽

I. S. Mian

Keyword(s):

Decision Support ◽

Experimental Studies ◽

Colon Adenocarcinoma ◽

Support Vector ◽

Marker Genes ◽

Published Data ◽

Transcription Profiling ◽

General Utility ◽

Feature Relevance ◽

The Impact

Transcription profiling experiments permit the expression levels of many genes to be measured simultaneously. Given profiling data from two types of samples, genes that most distinguish the samples (marker genes) are good candidates for subsequent in-depth experimental studies and developing decision support systems for diagnosis, prognosis, and monitoring. This work proposes a mixture of feature relevance experts as a method for identifying marker genes and illustrates the idea using published data from samples labeled as acute lymphoblastic and myeloid leukemia (ALL, AML). A feature relevance expert implements an algorithm that calculates how well a gene distinguishes samples, reorders genes according to this relevance measure, and uses a supervised learning method [here, support vector machines (SVMs)] to determine the generalization performances of different nested gene subsets. The mixture of three feature relevance experts examined implement two existing and one novel feature relevance measures. For each expert, a gene subset consisting of the top 50 genes distinguished ALL from AML samples as completely as all 7,070 genes. The 125 genes at the union of the top 50s are plausible markers for a prototype decision support system. Chromosomal aberration and other data support the prediction that the three genes at the intersection of the top 50s, cystatin C, azurocidin, and adipsin, are good targets for investigating the basic biology of ALL/AML. The same data were employed to identify markers that distinguish samples based on their labels of T cell/B cell, peripheral blood/bone marrow, and male/female. Selenoprotein W may discriminate T cells from B cells. Results from analysis of transcription profiling data from tumor/nontumor colon adenocarcinoma samples support the general utility of the aforementioned approach. Theoretical issues such as choosing SVM kernels and their parameters, training and evaluating feature relevance experts, and the impact of potentially mislabeled samples on marker identification (feature selection) are discussed.

Download Full-text

Algorithmic and Stochastic Representations of Gene Regulatory Networks and Protein-Protein Interactions

Current Topics in Medicinal Chemistry ◽

10.2174/1568026619666190311125256 ◽

2019 ◽

Vol 19 (6) ◽

pp. 413-425 ◽

Cited By ~ 3

Author(s):

Athanasios Alexiou ◽

Stylianos Chatzichronis ◽

Asma Perveen ◽

Abdul Hafeez ◽

Ghulam Md. Ashraf

Keyword(s):

Protein Interactions ◽

Biological Networks ◽

Regulatory Networks ◽

Review Paper ◽

Boolean Networks ◽

Diagnostic Tools ◽

Complex Nature ◽

Cellular Interactions ◽

Protein Protein Interactions ◽

Deterministic Models

Background:Latest studies reveal the importance of Protein-Protein interactions on physiologic functions and biological structures. Several stochastic and algorithmic methods have been published until now, for the modeling of the complex nature of the biological systems.Objective:Biological Networks computational modeling is still a challenging task. The formulation of the complex cellular interactions is a research field of great interest. In this review paper, several computational methods for the modeling of GRN and PPI are presented analytically.Methods:Several well-known GRN and PPI models are presented and discussed in this review study such as: Graphs representation, Boolean Networks, Generalized Logical Networks, Bayesian Networks, Relevance Networks, Graphical Gaussian models, Weight Matrices, Reverse Engineering Approach, Evolutionary Algorithms, Forward Modeling Approach, Deterministic models, Static models, Hybrid models, Stochastic models, Petri Nets, BioAmbients calculus and Differential Equations.Results:GRN and PPI methods have been already applied in various clinical processes with potential positive results, establishing promising diagnostic tools.Conclusion:In literature many stochastic algorithms are focused in the simulation, analysis and visualization of the various biological networks and their dynamics interactions, which are referred and described in depth in this review paper.

Download Full-text

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Download Full-text

Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

Optical Engineering ◽

10.1117/1.3518080 ◽

2010 ◽

Vol 49 (12) ◽

pp. 127003

Author(s):

Yang Jiang

Keyword(s):

Support Vector Machine ◽

Domain Knowledge ◽

Support Vector ◽

Video Classification ◽

Sports Video ◽

Knowledge Learning

Download Full-text

Experience of using the drug Glutoxim in patients with benign and borderline epithelial ovarian tumors after performing conservative surgical treatment

HEALTH OF WOMAN ◽

10.15574/hw.2018.134.79 ◽

2018 ◽

pp. 79-86

Author(s):

A.A. Sukhanova ◽

◽

M.Yu. Yegorov ◽

Keyword(s):

Surgical Treatment ◽

High Risk ◽

Ovarian Tumors ◽

Molecular Profile ◽

Control Group ◽

Profile Data ◽

Ki 67 ◽

Risk Of Recurrence ◽

Relapse Therapy ◽

E Cadherin

The objective: to increase the effectiveness of treatment of patients with benign and borderline epithelial ovarian tumors (EOT) after conservative operations performed based on the definition of a high risk group for recurrence and malignancy according to the molecular expression profile of the markers p53, Ki-67, estrogen receptors (ER), CD34 and E-cadherin and inclusion in the complex anti-relapse therapy of the immunomodulating drug Glutoxim. Materials and methods. A clinical examination of 60 patients of reproductive age with EOT was performed, which were treated with organ-sparing surgical treatment (main group). Of these 60 patients, 30 women (subgroup I) were diagnosed with benign EOT (BEOT), the remaining 30 women (subgroup II) were diagnosed with borderline EOT (BoEOT) Ia and Ib stages in FIGO. In removed tumors after routine histopathological examination, the molecular profile was determined by immunohistochemically determining the protein regulator of apoptosis p53, proliferation index (PI) by Ki-67 expression, estrogen receptors — ER, microvessel density by CD34 expression and E-cadherin intercellular adhesion protein. Based on the molecular profile determination data, the removed tumor was ranked as high or low risk of recurrence and malignancy. Patients from the high-risk group for relapse and malignancy according to the molecular profile data included the immunomodulating drug Glutoxim in the complex anti-relapse therapy - intramuscularly 10 mg daily for 2 weeks with a course repeated every six months for 3 years. The control group consisted of 64 patients with BEOT and BoEOT, who underwent conservative surgical treatment without further anti-relapse treatment. Results. During the molecular profile study, it was found that high risk of recurrence and malignancy had EOT with p53 expression (LI ≥15%), high proliferative activity of cells with Ki-67 expression (PI ≥10%), low estrogen reception (LI ER < 49.5%), high density of microvessels on the expression of CD34 (IM ≥40 mv /mm2), low level of intercellular adhesion on the expression of E-cadherin (LI <59%). Molecular profile characterizing a high risk of recurrence and malignancy, in most cases was inherent in BoEOT. The purpose of a comprehensive anti-relapse treatment with the inclusion of the immunomodulatory drug Glutoxim (intramuscularly daily at 10 mg for 2 weeks) after performing of sparing conservative surgical treatment with a repetition of the course every six months in patients at high risk of relapse and malignancy according to molecular profile data has reduced the relapse of EOT to 6.7% in patients of the main group compared with 20.3% in the control group during three years of follow-up observation of patients. The difference is statistically significant (p <0.05). Conclusion. In order to prevent cases of recurrence and malignancy in patients with EOT at high risk of relapse and malignancy according to molecular profile data after a sparing surgical treatment that preserves their reproductive function, it is recommended that Glutoxim is administered in complex anti-relapse therapy at 10 mg intramuscularly per every day for 2 weeks with a repetition of the course every six months for 3 years. Key words: benign epithelial ovarian tumors, borderline epithelial ovarian tumors, high risks of recurrence and malignancy, anti-relapse therapy, reproductive function, Glutoxim.

Download Full-text

WDNfinder: A method for minimum driver node set detection and analysis in directed and weighted biological network

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720017500214 ◽

2017 ◽

Vol 15 (05) ◽

pp. 1750021 ◽

Cited By ~ 4

Author(s):

Yanshuo Chu ◽

Zhenxing Wang ◽

Rongjie Wang ◽

Ningyi Zhang ◽

Jie Li ◽

...

Keyword(s):

Dna Damage ◽

Dna Damage Response ◽

Biological Networks ◽

Domain Knowledge ◽

Human Cancer ◽

Accurate Analysis ◽

Response Network ◽

Public Data ◽

Damage Response ◽

Structural Controllability

Structural controllability is the generalization of traditional controllability for dynamical systems. During the last decade, interesting biological discoveries have been inferred by applied structural controllability analysis to biological networks. However, false positive/negative information (i.e. nodes and edges) widely exists in biological networks that documented in public data sources, which can hinder accurate analysis of structural controllability. In this study, we propose WDNfinder, a comprehensive analysis package that provides structural controllability with consideration of node connection strength in biological networks. When applied to the human cancer signaling network and p53-mediate DNA damage response network, WDNfinder shows high accuracy on essential nodes prediction in these networks. Compared to existing methods, WDNfinder can significantly narrow down the set of minimum driver node set (MDS) under the restriction of domain knowledge. When using p53-mediate DNA damage response network as illustration, we find more meaningful MDSs by WDNfinder. The source code is implemented in python and publicly available together with relevant data on GitHub: https://github.com/dustincys/WDNfinder .

Download Full-text

Permutation Entropy-Based Interpretability of Convolutional Neural Network Models for Interictal EEG Discrimination of Subjects with Epileptic Seizures vs. Psychogenic Non-Epileptic Seizures

Entropy ◽

10.3390/e24010102 ◽

2022 ◽

Vol 24 (1) ◽

pp. 102

Author(s):

Michele Lo Giudice ◽

Giuseppe Varone ◽

Cosimo Ieracitano ◽

Nadia Mammone ◽

Giovanbattista Gaspare Tripodi ◽

...

Keyword(s):

Neural Network ◽

Discriminant Analysis ◽

Convolutional Neural Network ◽

Epileptic Seizures ◽

Permutation Entropy ◽

Diagnostic Tools ◽

Support Vector ◽

Feature Maps ◽

Time Frequency ◽

Interictal Eeg

The differential diagnosis of epileptic seizures (ES) and psychogenic non-epileptic seizures (PNES) may be difficult, due to the lack of distinctive clinical features. The interictal electroencephalographic (EEG) signal may also be normal in patients with ES. Innovative diagnostic tools that exploit non-linear EEG analysis and deep learning (DL) could provide important support to physicians for clinical diagnosis. In this work, 18 patients with new-onset ES (12 males, 6 females) and 18 patients with video-recorded PNES (2 males, 16 females) with normal interictal EEG at visual inspection were enrolled. None of them was taking psychotropic drugs. A convolutional neural network (CNN) scheme using DL classification was designed to classify the two categories of subjects (ES vs. PNES). The proposed architecture performs an EEG time-frequency transformation and a classification step with a CNN. The CNN was able to classify the EEG recordings of subjects with ES vs. subjects with PNES with 94.4% accuracy. CNN provided high performance in the assigned binary classification when compared to standard learning algorithms (multi-layer perceptron, support vector machine, linear discriminant analysis and quadratic discriminant analysis). In order to interpret how the CNN achieved this performance, information theoretical analysis was carried out. Specifically, the permutation entropy (PE) of the feature maps was evaluated and compared in the two classes. The achieved results, although preliminary, encourage the use of these innovative techniques to support neurologists in early diagnoses.

Download Full-text

Automatic Detection of Voltage Notches using Support Vector Machine

Renewable Energy and Power Quality Journal ◽

10.24084/repqj19.337 ◽

2021 ◽

Vol 19 ◽

pp. 528-533

Author(s):

Rongzhen Qi ◽

◽

Olga Zyabkina ◽

Daniel Agudelo Martinez ◽

Jan Meyer

Keyword(s):

Support Vector Machine ◽

Domain Knowledge ◽

Detection Efficiency ◽

Support Vector ◽

Svm Classifier ◽

Automatic Method ◽

Power Quality Disturbances ◽

Characteristic Features ◽

Comprehensive Framework ◽

Nonlinear Support

This paper presents a comprehensive framework for voltage notch analysis and an automatic method for notch detection using a nonlinear support vector machine (SVM) classifier. A comprehensive simulation of the notch disturbance has been conducted to generate a diverse database. Based on domain knowledge and properties of power quality disturbances (PQDs), a set of characteristic features is extracted. After feature extraction, a set of most descriptive features has been selected with decision tree (DT) algorithm, and a nonlinear SVM classifier has been trained. Finally, the detection efficiency of the trained model is presented and discussed.

Download Full-text

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

BMC Bioinformatics ◽

10.1186/1471-2105-11-395 ◽

2010 ◽

Vol 11 (1) ◽

Cited By ~ 1489

Author(s):

Tomáš Pluskal ◽

Sandra Castillo ◽

Alejandro Villar-Briones ◽

Matej Orešič

Keyword(s):

Mass Spectrometry ◽

Molecular Profile ◽

Profile Data ◽

Modular Framework

Download Full-text

Identification of Marker Genes Discriminating the Pathological Stages in Ovarian Carcinoma by Using Support Vector Machine and Systems Biology

Progress in Artificial Life - Lecture Notes in Computer Science ◽

10.1007/978-3-540-76931-6_33 ◽

2007 ◽

pp. 381-389

Author(s):

Meng-Hsiun Tsai ◽

Jun-Dong Chang ◽

Sheng-Hsiung Chiu ◽

Ching-Hao Lai

Keyword(s):

Support Vector Machine ◽

Systems Biology ◽

Ovarian Carcinoma ◽

Support Vector ◽

Marker Genes ◽

Pathological Stages

Download Full-text