Simpler Evaluation of Predictions and Signature Stability for Gene Expression Data

Journal of Biomedicine and Biotechnology ◽

10.1155/2009/587405 ◽

2009 ◽

Vol 2009 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Yvonne E. Pittelkow ◽

Susan R. Wilson

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Gene Expression Signature ◽

Gene Signature ◽

Forward Selection ◽

Data Set ◽

Cancer Data ◽

Large Gene ◽

Selection Of

Scientific advances are raising expectations that patient-tailored treatment will soon be available. The development of resulting clinical approaches needs to be based on well-designed experimental and observational procedures that provide data to which proper biostatistical analyses are applied. Gene expression microarray and related technology are rapidly evolving. It is providing extremely large gene expression profiles containing many thousands of measurements. Choosing a subset from these gene expression measurements to include in a gene expression signature is one of the many challenges needing to be met. Choice of this signature depends on many factors, including the selection of patients in the training set. So the reliability and reproducibility of the resultant prognostic gene signature needs to be evaluated, in such a way as to be relevant to the clinical setting. A relatively straightforward approach is based on cross validation, with separate selection of genes at each iteration to avoid selection bias. Within this approach we developed two different methods, one based on forward selection, the other on genes that were statistically significant in all training blocks of data. We demonstrate our approach to gene signature evaluation with a well-known breast cancer data set.

Download Full-text

Identification of synthetic lethal interactions with the BRAF oncogene in colorectal cancer.

Journal of Clinical Oncology ◽

10.1200/jco.2013.31.4_suppl.403 ◽

2013 ◽

Vol 31 (4_suppl) ◽

pp. 403-403

Author(s):

Loredana Vecchione ◽

Valentina Gambino ◽

Giovanni d'Ario ◽

Sun Tian ◽

Iris Simon ◽

...

Keyword(s):

Gene Expression ◽

Cell Lines ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Gene Expression Signature ◽

Gene Signature ◽

Braf V600e ◽

P Value ◽

Synthetic Lethal ◽

Braf Mutant

403 Background: Approximately 8-15% of colorectal (CRC) patients carry an activating mutation in BRAF. This CRC subtype is associated with poor outcome and with resistance, both to chemotherapeutic treatments and to tailored drugs. We recently showed that BRAF (V600E) colon cancers (CCs) have a characteristic gene expression signature (1, 2) which is found also in subsets of KRAS mutant and KRAS-BRAF wild type (WT2) tumors. Tumors having this gene signature, referred as “BRAF-like”, have a similar poor prognosis irrespective of the presence of the BRAF (V600E) mutation. By using a shRNA-based genetic screen in BRAF mutant CC cell lines we aimed to identify genes and pathways necessary for survival and growth of BRAFmutant CC. Such studies may reveal additional targets for therapy and potentially provide new biomarkers for patient stratification Methods: We identified 363 genes that are selectively overexpressed in BRAF mutant tumors as compared to WT2 type tumors, based on gene expression profiles of the PETACC3 (1) and Agendia (2) datasets. The TRC human genome-wide shRNA collection (TRC-Hs1.0) was used to generate a 1815 hairpins sub-library targeting those identified genes (BRAF library). BRAF(V600E) CC cell lines were infected with the BRAF library and screened for shRNAs that cause lethality. LIM1215 CC cell line (WT2) was used as a control. Cells stably expressing the shRNA library were cultured for 13 days, after which shRNAs were recovered by PCR. Deep sequencing was applied to determine the specific depletion of shRNA in BRAF(V600E) cells as compared to LIM1215 cells Results: Candidate genes were identified by using following filtering criteria: depletion in BRAF(V600E) cells by at least 50% and depletion in BRAF(V600E) cells 1, 5-fold higher than in control cells with the corresponding p-value to be ≤ 0.1. A total of 34 genes met our criteria of which 6 genes were presented with more than one hairpin and were concordant across the cell lines selected for validation. Conclusions: We identified candidate synthetic lethal genes in BRAF mutant CC cell lines. Functional analysis is ongoing. Data will be presented. References 1. J Clin Oncol 2012 Apr 20;30(12):1288-9 2. Gut (2012). doi:10.1136/gutjnl-2012-302423

Download Full-text

Identification of a Gene Expression Signature Common to Distinct Cancer Pathways

Cancer Informatics ◽

10.4137/cin.s9542 ◽

2012 ◽

Vol 11 ◽

pp. CIN.S9542 ◽

Cited By ~ 7

Author(s):

Niklaus Fankhauser ◽

Igor Cima ◽

Peter Wild ◽

Wilhelm Krek

Keyword(s):

Gene Expression ◽

Web Application ◽

Cell Transformation ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Gene Expression Signature ◽

Malignant Cell ◽

Cancer Genes ◽

Data Set ◽

Cancer Pathways

Mutations in cancer-causing genes induce changes in gene expression programs critical for malignant cell transformation. Publicly available gene expression profiles produced by modulating the expression of distinct cancer genes may therefore represent a rich resource for the identification of gene signatures common to seemingly unrelated cancer genes. We combined automatic retrieval with manual validation to obtain a data set of high-quality gene microarray profiles. This data set was used to create logical models of the signaling events underlying the observed expression changes produced by various cancer genes and allowed to uncover unknown and verifiable interactions. Data clustering revealed novel sets of gene expression profiles commonly regulated by distinct cancer genes. Our method allows retrieval of significant new information and testable hypotheses from a pool of deposited cancer gene expression experiments that are otherwise not apparent or appear insignificant from single measurements. The complete results are available through a web-application at http://biodata.ethz.ch/cgi-bin/geologic .

Download Full-text

Transcriptome profiling reveals novel BMI- and sex-specific gene expression signatures for human cardiac hypertrophy

Physiological Genomics ◽

10.1152/physiolgenomics.00122.2016 ◽

2017 ◽

Vol 49 (7) ◽

pp. 355-367 ◽

Cited By ~ 5

Author(s):

Mackenzie S. Newman ◽

Tina Nguyen ◽

Michael J. Watson ◽

Robert W. Hull ◽

Han-Gang Yu

Keyword(s):

Gene Expression ◽

Cardiac Hypertrophy ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Gene Expression Signature ◽

Left Ventricular ◽

P Value ◽

Specific Gene ◽

Published Data ◽

Data Set

How obesity or sex may affect the gene expression profiles of human cardiac hypertrophy is unknown. We hypothesized that body-mass index (BMI) and sex can affect gene expression profiles of cardiac hypertrophy. Human heart tissues were grouped according to sex (male, female), BMI (lean<25 kg/m2, obese>30 kg/m2), or left ventricular hypertrophy (LVH) and non-LVH nonfailed controls (NF). We identified 24 differentially expressed (DE) genes comparing female with male samples. In obese subgroup, there were 236 DE genes comparing LVH with NF; in lean subgroup, there were seven DE genes comparing LVH with NF. In female subgroup, we identified 1,320 significant genes comparing LVH with NF; in male subgroup, there were 1,383 significant genes comparing LVH with NF. There were seven significant genes comparing obese LVH with lean NF; comparing male obese LVH with male lean NF samples we found 106 significant genes; comparing female obese LVH with male lean NF, we found no significant genes. Using absolute value of log2 fold-change > 2 or extremely small P value (10−20) as a criterion, we identified nine significant genes (HBA1, HBB, HIST1H2AC, GSTT1, MYL7, NPPA, NPPB, PDK4, PLA2G2A) in LVH, also found in published data set for ischemic and dilated cardiomyopathy in heart failure. We identified a potential gene expression signature that distinguishes between patients with high BMI or between men and women with cardiac hypertrophy. Expression of established biomarkers atrial natriuretic peptide A (NPPA) and B (NPPB) were already significantly increased in hypertrophy compared with controls.

Download Full-text

Analysis of blood-based gene expression in idiopathic Parkinson disease

Neurology ◽

10.1212/wnl.0000000000004516 ◽

2017 ◽

Vol 89 (16) ◽

pp. 1676-1683 ◽

Cited By ~ 36

Author(s):

Ron Shamir ◽

Christine Klein ◽

David Amar ◽

Eva-Juliane Vollstedt ◽

Michael Bonin ◽

...

Keyword(s):

Gene Expression ◽

Parkinson Disease ◽

Gene Networks ◽

Large Scale ◽

Expression Profiles ◽

Area Under The Curve ◽

Gene Expression Profiles ◽

Gene Signature ◽

Gene Profiles ◽

Independent Test

Objective:To examine whether gene expression analysis of a large-scale Parkinson disease (PD) patient cohort produces a robust blood-based PD gene signature compared to previous studies that have used relatively small cohorts (≤220 samples).Methods:Whole-blood gene expression profiles were collected from a total of 523 individuals. After preprocessing, the data contained 486 gene profiles (n = 205 PD, n = 233 controls, n = 48 other neurodegenerative diseases) that were partitioned into training, validation, and independent test cohorts to identify and validate a gene signature. Batch-effect reduction and cross-validation were performed to ensure signature reliability. Finally, functional and pathway enrichment analyses were applied to the signature to identify PD-associated gene networks.Results:A gene signature of 100 probes that mapped to 87 genes, corresponding to 64 upregulated and 23 downregulated genes differentiating between patients with idiopathic PD and controls, was identified with the training cohort and successfully replicated in both an independent validation cohort (area under the curve [AUC] = 0.79, p = 7.13E–6) and a subsequent independent test cohort (AUC = 0.74, p = 4.2E–4). Network analysis of the signature revealed gene enrichment in pathways, including metabolism, oxidation, and ubiquitination/proteasomal activity, and misregulation of mitochondria-localized genes, including downregulation of COX4I1, ATP5A1, and VDAC3.Conclusions:We present a large-scale study of PD gene expression profiling. This work identifies a reliable blood-based PD signature and highlights the importance of large-scale patient cohorts in developing potential PD biomarkers.

Download Full-text

Integrative Genomics with Mediation Analysis in a Survival Context

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/413783 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Szilárd Nemes ◽

Toshima Z. Parris ◽

Anna Danielsson ◽

Zakaria Einbeigi ◽

Gunnar Steineck ◽

...

Keyword(s):

Expression Profiles ◽

Gene Expression Profiles ◽

Genomic Analysis ◽

Mrna Levels ◽

Integrative Genomics ◽

Asymptotic Results ◽

Data Set ◽

Cancer Data ◽

Dna And Rna ◽

Altered Gene

DNA copy number aberrations (DCNA) and subsequent altered gene expression profiles may have a major impact on tumor initiation, on development, and eventually on recurrence and cancer-specific mortality. However, most methods employed in integrative genomic analysis of the two biological levels, DNA and RNA, do not consider survival time. In the present note, we propose the adoption of a survival analysis-based framework for the integrative analysis of DCNA and mRNA levels to reveal their implication on patient clinical outcome with the prerequisite that the effect of DCNA on survival is mediated by mRNA levels. The specific aim of the paper is to offer a feasible framework to test the DCNA-mRNA-survival pathway. We provide statistical inference algorithms for mediation based on asymptotic results. Furthermore, we illustrate the applicability of the method in an integrative genomic analysis setting by using a breast cancer data set consisting of 141 invasive breast tumors. In addition, we provide implementation in R.

Download Full-text

Improved Feature Selection by Incorporating Gene Similarity into the LASSO

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2012010101 ◽

2012 ◽

Vol 3 (1) ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Christopher E. Gillies ◽

Xiaoli Gao ◽

Nilesh V. Patel ◽

Mohammad-Reza Siadat ◽

George D. Wilson

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Personalized Medicine ◽

Objective Function ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Genetic Profile ◽

Data Set ◽

Coordinate Descent Algorithm ◽

Gene Similarity

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.

Download Full-text

PCA-based unsupervised feature extraction for gene expression analysis of COVID-19 patients

Scientific Reports ◽

10.1038/s41598-021-95698-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kota Fujisawa ◽

Mamoru Shimo ◽

Y.-H. Taguchi ◽

Shinya Ikematsu ◽

Ryota Miyata

Keyword(s):

Gene Expression ◽

Feature Extraction ◽

Target Genes ◽

Gene Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Principal Component ◽

Data Set ◽

Immune Related Genes ◽

Unsupervised Feature Extraction

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.

Download Full-text

The effects of a globin blocker on the resolution of 3’mRNA sequencing data in porcine blood

BMC Genomics ◽

10.1186/s12864-019-6122-2 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Kyu-Sang Lim ◽

Qian Dong ◽

Pamela Moll ◽

Jana Vitkovska ◽

Gregor Wiktorin ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Globin Genes ◽

Sequencing Data ◽

Globin Mrna ◽

Data Set ◽

Mrna Sequencing ◽

Porcine Blood

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads compared to the NGB (from 56.4 to 10.1%). The second highest concentration C2, which showed very similar globin depletion rates (12%) as C1 but a better correlation of the expression of non-globin genes between NGB and GB (r = 0.98), allowed the expression of an additional 1295 non-globin genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of globin reads for NGB (n = 184) and GB (n = 189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB, similar to results from data set 1. Data set 3 (n = 84) revealed that the proportion of globin reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of globin reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-globin mRNA, the GB for QuantSeq has an advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.

Download Full-text

Impact of 70-Gene Signature Use on Adjuvant Chemotherapy Decisions in Patients With Estrogen Receptor–Positive Early Breast Cancer: Results of a Prospective Cohort Study

Journal of Clinical Oncology ◽

10.1200/jco.2016.70.3959 ◽

2017 ◽

Vol 35 (24) ◽

pp. 2814-2819 ◽

Cited By ~ 21

Author(s):

Anne Kuijer ◽

Marieke Straver ◽

Bianca den Dekker ◽

Annelotte C.M. van Bommel ◽

Sjoerd G. Elias ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Estrogen Receptor ◽

Early Stage ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Gene Signature ◽

Early Stage Breast Cancer ◽

Er Positive ◽

Test Result

Purpose Gene-expression profiles increasingly are used in addition to conventional prognostic factors to guide adjuvant chemotherapy (CT) decisions. The Dutch guideline suggests use of validated gene-expression profiles in patients with estrogen receptor (ER) –positive, early-stage breast cancer without overt lymph node metastases. We aimed to assess the impact of a 70-gene signature (70-GS) test on CT decisions in patients with ER-positive, early-stage breast cancer. Patients and Methods In a prospective, observational, multicenter study in patients younger than 70 years old who had undergone surgery for ER-positive, early-stage breast cancer, physicians were asked whether they intended to administer adjuvant CT before deployment of the 70-GS test and after the test result was available. Results Between October 1, 2013, and December 31, 2015, 660 patients, treated in 33 hospitals, were enrolled. Fifty-one percent of patients had pT1cN0, BRII, HER2-Neu-negative breast cancer. On the basis of conventional clinicopathological characteristics, physicians recommended CT in 270 (41%) of the 660 patients and recommended withholding CT in 107 (16%) of the 660 patients. For the remaining 43% of patients, the physicians were unsure and unable to give advice before 70-GS testing. In patients for whom CT was initially recommended or not recommended, 56% and 59%, respectively, were assigned to a low-risk profile by the 70-GS (κ, 0.02; 95% CI, -0.08 to 0.11). After disclosure of the 70-GS test result, the preliminary advice was changed in 51% of patients who received a recommendation before testing; the definitive CT recommendation of the physician was in line with the 70-GS result in 96% of patients. Conclusion In this prospective, multicenter study in a selection of patients with ER-positive, early-stage breast cancer, 70-GS use changed the physician-intended recommendation to administer CT in half of the patients.

Download Full-text

Genomics and Systems Biology.

Blood ◽

10.1182/blood.v112.11.sci-51.sci-51 ◽

2008 ◽

Vol 112 (11) ◽

pp. sci-51-sci-51

Author(s):

Todd R. Golub

Keyword(s):

Gene Expression ◽

High Throughput ◽

Biological Networks ◽

High Throughput Screening ◽

Expression Profiles ◽

Low Cost ◽

Gene Expression Profiles ◽

Gene Expression Signature ◽

Small Molecule Libraries ◽

Approved Drugs

Genomics holds particular potential for the elucidation of biological networks that underlie disease. For example, gene expression profiles have been used to classify human cancers, and have more recently been used to predict graft rejection following organ transplantation. Such signatures thus hold promise both as diagnostic approaches and as tools with which to dissect biological mechanism. Such systems-based approaches are also beginning to impact the drug discovery process. For example, it is now feasible to measure gene expression signatures at low cost and high throughput, thereby allowing for the screening libraries of small molecule libraries in order to identify compounds capable of perturbing a signature of interest (even if the critical drivers of that signature are not yet known). This approach, known as Gene Expression-Based High Throughput Screening (GE-HTS), has been shown to identify candidate therapeutic approaches in AML, Ewing sarcoma, and neuroblastoma, and has identified tool compounds capable of inhibiting PDGF receptor signaling. A related approach, known as the Connectivity Map (www.broad.mit.edu/cmap) attempts to use gene expression profiles as a universal language with which to connect cellular states, gene product function, and drug action. In this manner, a gene expression signature of interest is used to computationally query a database of gene expression profiles of cells systematically treated with a large number of compounds (e.g., all off-patent FDA-approved drugs), thereby identifying potential new applications for existing drugs. Such systems level approaches thus seek chemical modulators of cellular states, even when the molecular basis of such altered states is unknown.

Download Full-text