Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data
AbstractMotivationFinding nonlinear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have crucial drawbacks, among others lack of parsimony, non-convexity, and computational overhead. Here we present the block HSIC Lasso, a nonlinear feature selector that does not present the previous drawbacks.ResultsWe compare the block HSIC Lasso to other state-of-the-art feature selection techniques in synthetic data and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA-seq, and GWAS. In all the cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than features of other techniques. As a proof of concept, we applied the block HSIC Lasso to a single-cell RNA-seq experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons.AvailabilityBlock HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available in Github (https://github.com/riken-aip/pyHSICLasso) and PyPi (https://pypi.org/project/pyHSICLasso)[email protected] informationSupplementary data are available at Bioinformatics online.