Logistic regression error‐in‐covariate models for longitudinal high‐dimensional covariates

Hyung Park; Seonjoo Lee

doi:10.1002/sta4.246

Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model

Technology and Health Care ◽

10.3233/thc-218026 ◽

2021 ◽

Vol 29 ◽

pp. 287-295

Author(s):

Zhiming Zhou ◽

Haihui Huang ◽

Yong Liang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Simulated Data ◽

Biological Data ◽

Cancer Classification ◽

High Dimensional ◽

Data Set ◽

Biomarker Selection

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.

Download Full-text

High-Dimensional Classification by Sparse Logistic Regression

IEEE Transactions on Information Theory ◽

10.1109/tit.2018.2884963 ◽

2019 ◽

Vol 65 (5) ◽

pp. 3068-3079 ◽

Cited By ~ 5

Author(s):

Felix Abramovich ◽

Vadim Grinshtein

Keyword(s):

Logistic Regression ◽

High Dimensional ◽

Sparse Logistic Regression ◽

Dimensional Classification

Download Full-text

Logistic Regression Ensemble (LORENS) Applied to Drug Discovery

MATEMATIKA ◽

10.11113/matematika.v36.n1.1197 ◽

2020 ◽

Vol 36 (1) ◽

pp. 43-49

Author(s):

T Dwi Ary Widhianingsih ◽

Heri Kuswanto ◽

Dedy Dwi Prastyo

Keyword(s):

Logistic Regression ◽

Drug Discovery ◽

Objective Function ◽

Classification Performance ◽

High Dimensionality ◽

High Dimensional ◽

Classification Methods ◽

Data Set ◽

Computational Burden ◽

Cancerous Cells

Logistic regression is one of the commonly used classification methods. It has some advantages, specifically related to hypothesis testing and its objective function. However, it also has some disadvantages in the case of high-dimensional data, such as multicolinearity, over-fitting, and a high computational burden. Ensemblebased classification methods have been proposed to overcome these problems. The logistic regression ensemble (LORENS) method is expected to improve the classification performance of basic logistic regression. In this paper, we apply it to the case of drug discovery with the objective of obtaining candidate compounds to protect the normal non-cancerous cells, which is considered to be a problem with a data-set of high dimensionality. The experimental results show that it performs well, with an accuracy of 69% and AUC of 0.7306.

Download Full-text

Fully Bayesian logistic regression with hyper-LASSO priors for high-dimensional feature selection

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2018.1490418 ◽

2018 ◽

Vol 88 (14) ◽

pp. 2827-2851 ◽

Cited By ~ 2

Author(s):

Longhai Li ◽

Weixin Yao

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

High Dimensional ◽

Fully Bayesian ◽

Bayesian Logistic Regression

Download Full-text

Automated NLP Extraction of Clinical Rationale for Treatment Discontinuation in Breast Cancer

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00139 ◽

2021 ◽

pp. 550-560

Author(s):

Matthew S. Alkaitis ◽

Monica N. Agrawal ◽

Gregory J. Riely ◽

Pedram Razavi ◽

David Sontag

Keyword(s):

Breast Cancer ◽

Epidermal Growth Factor Receptor ◽

Logistic Regression ◽

Standard Deviation ◽

Early Stage ◽

Area Under The Curve ◽

Growth Factor Receptor ◽

Treatment Discontinuation ◽

High Dimensional ◽

Epidermal Growth

PURPOSE Key oncology end points are not routinely encoded into electronic medical records (EMRs). We assessed whether natural language processing (NLP) can abstract treatment discontinuation rationale from unstructured EMR notes to estimate toxicity incidence and progression-free survival (PFS). METHODS We constructed a retrospective cohort of 6,115 patients with early-stage and 701 patients with metastatic breast cancer initiating care at Memorial Sloan Kettering Cancer Center from 2008 to 2019. Each cohort was divided into training (70%), validation (15%), and test (15%) subsets. Human abstractors identified the clinical rationale associated with treatment discontinuation events. Concatenated EMR notes were used to train high-dimensional logistic regression and convolutional neural network models. Kaplan-Meier analyses were used to compare toxicity incidence and PFS estimated by our NLP models to estimates generated by manual labeling and time-to-treatment discontinuation (TTD). RESULTS Our best high-dimensional logistic regression models identified toxicity events in early-stage patients with an area under the curve of the receiver-operator characteristic of 0.857 ± 0.014 (standard deviation) and progression events in metastatic patients with an area under the curve of 0.752 ± 0.027 (standard deviation). NLP-extracted toxicity incidence and PFS curves were not significantly different from manually extracted curves ( P = .95 and P = .67, respectively). By contrast, TTD overestimated toxicity in early-stage patients ( P < .001) and underestimated PFS in metastatic patients ( P < .001). Additionally, we tested an extrapolation approach in which 20% of the metastatic cohort were labeled manually, and NLP algorithms were used to abstract the remaining 80%. This extrapolated outcomes approach resolved PFS differences between receptor subtypes ( P < .001 for hormone receptor+/human epidermal growth factor receptor 2− v human epidermal growth factor receptor 2+ v triple-negative) that could not be resolved with TTD. CONCLUSION NLP models are capable of abstracting treatment discontinuation rationale with minimal manual labeling.

Download Full-text

Penalized logistic regression with low prevalence exposures beyond high dimensional settings

PLoS ONE ◽

10.1371/journal.pone.0217057 ◽

2019 ◽

Vol 14 (5) ◽

pp. e0217057 ◽

Cited By ~ 10

Author(s):

Sam Doerken ◽

Marta Avalos ◽

Emmanuel Lagarde ◽

Martin Schumacher

Keyword(s):

Logistic Regression ◽

High Dimensional ◽

Penalized Logistic Regression ◽

Low Prevalence

Download Full-text

Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification

Entropy ◽

10.3390/e22050543 ◽

2020 ◽

Vol 22 (5) ◽

pp. 543 ◽

Cited By ~ 2

Author(s):

Konrad Furmańczyk ◽

Wojciech Rejchel

Keyword(s):

Logistic Regression ◽

Variable Selection ◽

Logistic Model ◽

Binary Classification ◽

Model Misspecification ◽

High Dimensional ◽

Classification Models ◽

Computationally Efficient ◽

Class Labels ◽

Penalized Logistic Regression

In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic regression to the classification data, which possibly do not follow the logistic model. The second method is even more radical: we just treat class labels of objects as they were numbers and apply penalized linear regression. In this paper, we investigate thoroughly these two approaches and provide conditions, which guarantee that they are successful in prediction and variable selection. Our results hold even if the number of predictors is much larger than the sample size. The paper is completed by the experimental results.

Download Full-text

Using principal components for estimating logistic regression with high-dimensional multicollinear data

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2005.03.011 ◽

2006 ◽

Vol 50 (8) ◽

pp. 1905-1924 ◽

Cited By ~ 90

Author(s):

Ana M. Aguilera ◽

Manuel Escabias ◽

Mariano J. Valderrama

Keyword(s):

Logistic Regression ◽

Principal Components ◽

High Dimensional

Download Full-text

The cross-validated AUC for MCP-logistic regression with high-dimensional data

Statistical Methods in Medical Research ◽

10.1177/0962280211428385 ◽

2011 ◽

Vol 22 (5) ◽

pp. 505-518 ◽

Cited By ~ 7

Author(s):

Dingfeng Jiang ◽

Jian Huang ◽

Ying Zhang

Keyword(s):

Logistic Regression ◽

High Dimensional Data ◽

High Dimensional ◽

The Cross

Download Full-text

Robust and sparse estimation methods for high-dimensional linear and logistic regression

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2017.11.017 ◽

2018 ◽

Vol 172 ◽

pp. 211-222 ◽

Cited By ~ 11

Author(s):

Fatma Sevinç Kurnaz ◽

Irene Hoffmann ◽

Peter Filzmoser

Keyword(s):

Logistic Regression ◽

Estimation Methods ◽

High Dimensional ◽

Sparse Estimation

Download Full-text