column subset selection Latest Research Papers

Relevant and Non-Redundant Feature Selection for Cancer Classification and Subtype Detection

Cancers ◽

10.3390/cancers13174297 ◽

2021 ◽

Vol 13 (17) ◽

pp. 4297

Author(s):

Pratip Rana ◽

Phuc Thai ◽

Thang Dinh ◽

Preetam Ghosh

Keyword(s):

Feature Selection ◽

Gene Selection ◽

False Negative ◽

Multiclass Classification ◽

Feature Reduction ◽

Control Group ◽

Normal Sample ◽

Classification Problems ◽

Column Subset Selection

Biologists seek to identify a small number of significant features that are important, non-redundant, and relevant from diverse omics data. For example, statistical methods such as LIMMA and DEseq distinguish differentially expressed genes between a case and control group from the transcript profile. Researchers also apply various column subset selection algorithms on genomics datasets for a similar purpose. Unfortunately, genes selected by such statistical or machine learning methods are often highly co-regulated, making their performance inconsistent. Here, we introduce a novel feature selection algorithm that selects highly disease-related and non-redundant features from a diverse set of omics datasets. We successfully applied this algorithm to three different biological problems: (a) disease-to-normal sample classification; (b) multiclass classification of different disease samples; and (c) disease subtypes detection. Considering the classification of ROC-AUC, false-positive, and false-negative rates, our algorithm outperformed other gene selection and differential expression (DE) methods for all six types of cancer datasets from TCGA considered here for binary and multiclass classification problems. Moreover, genes picked by our algorithm improved the disease subtyping accuracy for four different cancer types over state-of-the-art methods. Hence, we posit that our proposed feature reduction method can support the community to solve various problems, including the selection of disease-specific biomarkers, precision medicine design, and disease sub-type detection.

Download Full-text

Surrogate model–based inverse parameter estimation in deep drawing using automatic knowledge acquisition

The International Journal of Advanced Manufacturing Technology ◽

10.1007/s00170-021-07642-x ◽

2021 ◽

Author(s):

Matthias Ryser ◽

Felix M. Neuhauser ◽

Christoph Hein ◽

Pavel Hora ◽

Markus Bambach

Keyword(s):

Knowledge Acquisition ◽

Process Parameters ◽

Deep Drawing ◽

Blank Holder ◽

Target Values ◽

Simulation Based ◽

Column Subset Selection ◽

Automatic Knowledge Acquisition ◽

Sensor Signals

AbstractIn this paper, we propose a new approach for the simulation-based support of tryout operations in deep drawing which can be schematically classified as automatic knowledge acquisition. The central idea is to identify information maximising sensor positions for draw-in as well as local blank holder force sensors by solving the column subset selection problem with respect to the sensor sensitivities. Inverse surrogate models are then trained using the selected sensor signals as predictors and the material and process parameters as targets. The final models are able to observe the drawing process by estimating current material and process parameters, which can then be compared to the target values to identify process corrections. The methodology is examined on an Audi A8L side panel frame using a set of 635 simulations, where 20 out of 21 material and process parameters can be estimated with an R2 value greater than 0.9. The result shows that the observational models are not only capable of estimating all but one process parameters with high accuracy, but also allow the determination of material parameters at the same time. Since no assumptions are made about the type of process, sensors, material or process parameters, the methodology proposed can also be applied to other manufacturing processes and use cases.

Download Full-text

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/647 ◽

2021 ◽

Author(s):

Michał Dereziński ◽

Rajiv Khanna ◽

Michael W. Mahoney

Keyword(s):

Subset Selection ◽

Low Rank ◽

Data Matrix ◽

Worst Case Analysis ◽

Nyström Method ◽

Worst Case ◽

Nystrom Method ◽

Column Subset Selection ◽

Low Rank Approximations ◽

The Cost

The Column Subset Selection Problem (CSSP) and the Nystrom method are among the leading tools for constructing interpretable low-rank approximations of large datasets by selecting a small but representative set of features or instances. A fundamental question in this area is: what is the cost of this interpretability, i.e., how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the cost of interpretability as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nystrom tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.

Download Full-text

Column subset selection is NP-complete

Linear Algebra and its Applications ◽

10.1016/j.laa.2020.09.015 ◽

2021 ◽

Vol 610 ◽

pp. 52-58

Author(s):

Yaroslav Shitov

Keyword(s):

Subset Selection ◽

Column Subset Selection ◽

Np Complete

Download Full-text

A Distributed Integrated Feature Selection Scheme for Column Subset Selection

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2021.3108146 ◽

2021 ◽

pp. 1-1

Author(s):

Zheng Xiao ◽

Pengcheng Wei ◽

Anthony Chronopoulos ◽

Anne C. Elster

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Selection Scheme ◽

Column Subset Selection

Download Full-text

Optimal ℓ1 Column Subset Selection and a Fast PTAS for Low Rank Approximation

Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) ◽

10.1137/1.9781611976465.34 ◽

2021 ◽

pp. 560-578

Author(s):

Arvind V. Mahankali ◽

David P. Woodruff

Keyword(s):

Subset Selection ◽

Low Rank ◽

Low Rank Approximation ◽

Column Subset Selection ◽

Rank Approximation

Download Full-text

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

10.36227/techrxiv.13253945.v1 ◽

2020 ◽

Author(s):

Mohsen Joneidi ◽

Saeed Vahidian ◽

Ashkan Esmaeili ◽

Siavash Khodadadeh

Keyword(s):

Deep Learning ◽

Upper Bound ◽

Linear Complexity ◽

Subset Selection ◽

Selection Problem ◽

Theoretical Methods ◽

Novel Technique ◽

Original Dataset ◽

Column Subset Selection ◽

Minimum Number

We propose a novel technique for finding representatives from a large, unsupervised dataset. The approach is based on the concept of self-rank, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation. Our proposed algorithm enjoys linear complexity w.r.t. the size of original dataset and simultaneously it provides an adaptive upper bound for approximation ratio. These favorable characteristics result in filling a historical gap between practical and theoretical methods in finding representatives.<br>

Download Full-text

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

10.36227/techrxiv.13253945 ◽

2020 ◽

Author(s):

Mohsen Joneidi ◽

Saeed Vahidian ◽

Ashkan Esmaeili ◽

Siavash Khodadadeh

Keyword(s):

Deep Learning ◽

Upper Bound ◽

Linear Complexity ◽

Subset Selection ◽

Selection Problem ◽

Theoretical Methods ◽

Novel Technique ◽

Original Dataset ◽

Column Subset Selection ◽

Minimum Number

We propose a novel technique for finding representatives from a large, unsupervised dataset. The approach is based on the concept of self-rank, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation. Our proposed algorithm enjoys linear complexity w.r.t. the size of original dataset and simultaneously it provides an adaptive upper bound for approximation ratio. These favorable characteristics result in filling a historical gap between practical and theoretical methods in finding representatives.<br>

Download Full-text

Preprocessing COVID-19 Radiographic Images by Evolutionary Column Subset Selection

Advances in Intelligent Networking and Collaborative Systems - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-57796-4_41 ◽

2020 ◽

pp. 425-436

Author(s):

Jana Nowaková ◽

Pavel Krömer ◽

Jan Platoš ◽

Václav Snášel

Keyword(s):

Subset Selection ◽

Radiographic Images ◽

Column Subset Selection

Download Full-text

Evaluating column subset selection methods for endmember extraction in hyperspectral unmixing

Algorithms, Technologies, and Applications for Multispectral and Hyperspectral Imagery XXVI ◽

10.1117/12.2559910 ◽

2020 ◽

Author(s):

Maher Aldeghlawi ◽

Mohammed Q. Alkhatib ◽

Miguel Velez-Reyes

Keyword(s):

Subset Selection ◽

Selection Methods ◽

Endmember Extraction ◽

Hyperspectral Unmixing ◽

Column Subset Selection

Download Full-text

column subset selection
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Relevant and Non-Redundant Feature Selection for Cancer Classification and Subtype Detection

Surrogate model–based inverse parameter estimation in deep drawing using automatic knowledge acquisition

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Column subset selection is NP-complete

A Distributed Integrated Feature Selection Scheme for Column Subset Selection

Optimal ℓ1 Column Subset Selection and a Fast PTAS for Low Rank Approximation

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

Preprocessing COVID-19 Radiographic Images by Evolutionary Column Subset Selection

Evaluating column subset selection methods for endmember extraction in hyperspectral unmixing

Export Citation Format

column subset selectionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Relevant and Non-Redundant Feature Selection for Cancer Classification and Subtype Detection

Surrogate model–based inverse parameter estimation in deep drawing using automatic knowledge acquisition

Improved Guarantees and a Multiple-descent Curve for Column Subset Selection and the Nystrom Method (Extended Abstract)

Column subset selection is NP-complete

A Distributed Integrated Feature Selection Scheme for Column Subset Selection

Optimal ℓ1 Column Subset Selection and a Fast PTAS for Low Rank Approximation

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

Optimality of Spectrum Pursuit for Column Subset Selection Problem: Theoretical Guarantees and Applications in Deep Learning

Preprocessing COVID-19 Radiographic Images by Evolutionary Column Subset Selection

Evaluating column subset selection methods for endmember extraction in hyperspectral unmixing

column subset selection
Recently Published Documents