Scrublet: computational identification of cell doublets in single-cell transcriptomic data

Mapping Intimacies ◽

10.1101/357368 ◽

2018 ◽

Cited By ~ 23

Author(s):

Samuel L. Wolock ◽

Romain Lopez ◽

Allon M. Klein

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

Expert Knowledge ◽

Transcriptomic Data ◽

Nearest Neighbor Classifier ◽

Cell Clustering ◽

Powerful Approach ◽

Single Cell Rna Sequencing ◽

The Impact ◽

Neighbor Classifier

AbstractSingle-cell RNA-sequencing has become a widely used, powerful approach for studying cell populations. However, these methods often generate multiplet artifacts, where two or more cells receive the same barcode, resulting in a hybrid transcriptome. In most experiments, multiplets account for several percent of transcriptomes and can confound downstream data analysis. Here, we present Scrublet (Single-Cell Remover of Doublets), a framework for predicting the impact of multiplets in a given analysis and identifying problematic multiplets. Scrublet avoids the need for expert knowledge or cell clustering by simulating multiplets from the data and building a nearest neighbor classifier. To demonstrate the utility of this approach, we test Scrublet on several datasets that include independent knowledge of cell multiplets.

Download Full-text

A High-Voltage Electric Switch Classification System Based on K-Nearest Neighbor Classifier

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344925 ◽

2020 ◽

Author(s):

Haien Wang ◽

Jing Zhang ◽

Yang Zhao ◽

Jun Wang ◽

Xiaorong Du

Keyword(s):

High Voltage ◽

Classification System ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Classifier ◽

Neighbor Classifier

Download Full-text

A pattern synthesis technique with an efficient nearest neighbor classifier for binary pattern recognition

Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. ◽

10.1109/icpr.2004.1333791 ◽

2004 ◽

Cited By ~ 2

Author(s):

R. Viswanath ◽

M. Narasimha Murty ◽

S. Bhatnagar

Keyword(s):

Pattern Recognition ◽

Nearest Neighbor ◽

Pattern Synthesis ◽

Nearest Neighbor Classifier ◽

Synthesis Technique ◽

Neighbor Classifier

Download Full-text

MBRS-46. CHARTING NEOPLASTIC AND IMMUNE CELL HETEROGENEITY IN HUMAN AND GEM MODELS OF MEDULLOBLASTOMA USING scRNAseq

Neuro-Oncology ◽

10.1093/neuonc/noaa222.555 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii406-iii406

Author(s):

Andrew Donson ◽

Kent Riemondy ◽

Sujatha Venkataraman ◽

Ahmed Gilani ◽

Bridget Sanford ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Immune Cell ◽

Genetically Engineered ◽

Cellular Heterogeneity ◽

Cell Heterogeneity ◽

Transcriptomic Data ◽

Single Cell Rna Sequencing ◽

Transcript Profiles

Abstract We explored cellular heterogeneity in medulloblastoma using single-cell RNA sequencing (scRNAseq), immunohistochemistry and deconvolution of bulk transcriptomic data. Over 45,000 cells from 31 patients from all main subgroups of medulloblastoma (2 WNT, 10 SHH, 9 GP3, 11 GP4 and 1 GP3/4) were clustered using Harmony alignment to identify conserved subpopulations. Each subgroup contained subpopulations exhibiting mitotic, undifferentiated and neuronal differentiated transcript profiles, corroborating other recent medulloblastoma scRNAseq studies. The magnitude of our present study builds on the findings of existing studies, providing further characterization of conserved neoplastic subpopulations, including identification of a photoreceptor-differentiated subpopulation that was predominantly, but not exclusively, found in GP3 medulloblastoma. Deconvolution of MAGIC transcriptomic cohort data showed that neoplastic subpopulations are associated with major and minor subgroup subdivisions, for example, photoreceptor subpopulation cells are more abundant in GP3-alpha. In both GP3 and GP4, higher proportions of undifferentiated subpopulations is associated with shorter survival and conversely, differentiated subpopulation is associated with longer survival. This scRNAseq dataset also afforded unique insights into the immune landscape of medulloblastoma, and revealed an M2-polarized myeloid subpopulation that was restricted to SHH medulloblastoma. Additionally, we performed scRNAseq on 16,000 cells from genetically engineered mouse (GEM) models of GP3 and SHH medulloblastoma. These models showed a level of fidelity with corresponding human subgroup-specific neoplastic and immune subpopulations. Collectively, our findings advance our understanding of the neoplastic and immune landscape of the main medulloblastoma subgroups in both humans and GEM models.

Download Full-text

Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset

Engineering Science and Technology an International Journal ◽

10.1016/j.jestch.2021.05.015 ◽

2021 ◽

Author(s):

Deepti Sisodia ◽

Dilip Singh Sisodia

Keyword(s):

Nearest Neighbor ◽

Fraud Detection ◽

Prototype Selection ◽

K Nearest Neighbor ◽

Click Fraud ◽

Nearest Neighbor Classifier ◽

Neighbor Classifier

Download Full-text

Nearest neighbor classifier based on riemannian metric in radar target recognition

IEEE International Radar Conference, 2005. ◽

10.1109/radar.2005.1435946 ◽

2005 ◽

Author(s):

Meng Jicheng ◽

Yang Wanlin

Keyword(s):

Nearest Neighbor ◽

Target Recognition ◽

Riemannian Metric ◽

Radar Target ◽

Nearest Neighbor Classifier ◽

Radar Target Recognition ◽

Neighbor Classifier

Download Full-text

Improving the Behavior of the Nearest Neighbor Classifier against Noisy Data with Feature Weighting Schemes

Lecture Notes in Computer Science - Hybrid Artificial Intelligence Systems ◽

10.1007/978-3-319-07617-1_52 ◽

2014 ◽

pp. 597-606 ◽

Cited By ~ 1

Author(s):

José A. Sáez ◽

Joaquín Derrac ◽

Julián Luengo ◽

Francisco Herrera

Keyword(s):

Nearest Neighbor ◽

Noisy Data ◽

Feature Weighting ◽

Weighting Schemes ◽

Nearest Neighbor Classifier ◽

Neighbor Classifier

Download Full-text

k-Nearest Neighbor Classifier and Supervised Clustering

Data Mining ◽

10.1201/b15288-7 ◽

2013 ◽

pp. 117-137

Author(s):

Nong Ye

Keyword(s):

Nearest Neighbor ◽

K Nearest Neighbor ◽

Supervised Clustering ◽

Nearest Neighbor Classifier ◽

Neighbor Classifier

Download Full-text

Stronger Automation for Flyspeck by Feature Weighting and Strategy Evolution

10.29007/5gzr ◽

2018 ◽

Author(s):

Cezary Kaliszyk ◽

Josef Urban

Keyword(s):

Nearest Neighbor ◽

Feature Weighting ◽

K Nearest Neighbor ◽

Nearest Neighbor Classifier ◽

Hol Light ◽

Distance Weighted ◽

Neighbor Classifier

Two complementary AI methods are used to improve the strength of the AI/ATP service for proving conjectures over the HOL Light and Flyspeck corpora. First, several schemes for frequency-based feature weighting are explored in combination with distance-weighted k-nearest-neighbor classifier. This results in 16% improvement (39.0% to 45.5% Flyspeck problems solved) of the overall strength of the service when using 14 CPUs and 30 seconds. The best premise-selection/ATP combination is improved from 24.2% to 31.4%, i.e. by 30%. A smaller improvement is obtained by evolving targetted E prover strategies on two particular premise selections, using the Blind Strategymaker (BliStr) system. This raises the performance of the best AI/ATP method from 31.4% to 34.9%, i.e. by 11%, and raises the current 14-CPU power of the service to 46.9%.

Download Full-text

Quality assessment of single-cell RNA sequencing data by coverage skewness analysis

10.1101/2019.12.31.890269 ◽

2019 ◽

Author(s):

Imad Abugessaisa ◽

Shuhei Noguchi ◽

Melissa Cardon ◽

Akira Hasegawa ◽

Kazuhide Watanabe ◽

...

Keyword(s):

Quality Assessment ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Assessment Method ◽

Poor Quality ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Gene Coverage ◽

The Impact

AbstractAnalysis and interpretation of single-cell RNA-sequencing (scRNA-seq) experiments are compromised by the presence of poor quality cells. For meaningful analyses, such poor quality cells should be excluded to avoid biases and large variation. However, no clear guidelines exist. We introduce SkewC, a novel quality-assessment method to identify poor quality single-cells in scRNA-seq experiments. The method is based on the assessment of gene coverage for each single cell and its skewness as a quality measure. To validate the method, we investigated the impact of poor quality cells on downstream analyses and compared biological differences between typical and poor quality cells. Moreover, we measured the ratio of intergenic expression, suggesting genomic contamination, and foreign organism contamination of single-cell samples. SkewC is tested in 37,993 single-cells generated by 15 scRNA-seq protocols. We envision SkewC as an indispensable QC method to be incorporated into scRNA-seq experiment to preclude the possibility of scRNA-seq data misinterpretation.

Download Full-text

A patient distance metric for neurology

10.21203/rs.3.rs-20018/v1 ◽

2020 ◽

Author(s):

Daniel B Hier ◽

Jonathan Kopel ◽

Steven U Brint ◽

Donald C Wunsch II ◽

Gayla R Olbricht ◽

...

Keyword(s):

Nearest Neighbor ◽

Signs And Symptoms ◽

Diagnostic Error ◽

K Nearest Neighbor ◽

Bipartite Matching ◽

Nearest Neighbor Classifier ◽

Neurological Signs ◽

Neurological Patients ◽

Machine Readable ◽

Neighbor Classifier

Abstract Objective: Neurologists lack a metric for measuring the distance between neurological patients. When neurological signs and symptoms are represented as neurological concepts from a hierarchical ontology and neurological patients are represented as sets of concepts, distances between patients can be represented as inter-set distances.Methods:We converted the neurological signs and symptoms from 721 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated inter-concept distances based a hierarchical ontology and we calculated inter-patient distances by semantic weighted bipartite matching. We evaluated the accuracy of a k-nearest neighbor classifier to allocate patients into 40 diagnostic classes.Results:Within a given diagnosis, mean patient distance differed by diagnosis, suggesting that across diagnoses there are differences in how similar patients are to other patients with the same diagnosis. The mean distance from one diagnosis to another diagnosis differed by diagnosis, suggesting that diagnoses differ in their proximity to other diagnoses. Utilizing a k-nearest neighbor classifier and inter-patient distances, the risk of misclassification differed by diagnosis.Conclusion:If signs and symptoms are converted to machine-readable codes and patients are represented as sets of these codes, patient distances can be computed as an inter-set distance. These patient distances given insights into how homogeneous patients are within a diagnosis (stereotypy), the distance between different diagnoses (proximity), and the risk of diagnosis misclassification (diagnostic error).

Download Full-text