cluster purity
Recently Published Documents


TOTAL DOCUMENTS

9
(FIVE YEARS 3)

H-INDEX

2
(FIVE YEARS 0)

GigaScience ◽  
2021 ◽  
Vol 10 (10) ◽  
Author(s):  
Vinay S Swamy ◽  
Temesgen D Fufa ◽  
Robert B Hufnagel ◽  
David M McGaughey

Abstract Background: The development of highly scalable single-cell transcriptome technology has resulted in the creation of thousands of datasets, >30 in the retina alone. Analyzing the transcriptomes between different projects is highly desirable because this would allow for better assessment of which biological effects are consistent across independent studies. However it is difficult to compare and contrast data across different projects because there are substantial batch effects from computational processing, single-cell technology utilized, and the natural biological variation. While many single-cell transcriptome-specific batch correction methods purport to remove the technical noise, it is difficult to ascertain which method functions best. Results: We developed a lightweight R package (scPOP, single-cell Pick Optimal Parameters) that brings in batch integration methods and uses a simple heuristic to balance batch merging and cell type/cluster purity. We use this package along with a Snakefile-based workflow system to demonstrate how to optimally merge 766,615 cells from 33 retina datsets and 3 species to create a massive ocular single-cell transcriptome meta-atlas. Conclusions: This provides a model for how to efficiently create meta-atlases for tissues and cells of interest.


Author(s):  
Zuzana Gajarska ◽  
Lukas Brunnbauer ◽  
Hans Lohninger ◽  
Andreas Limbeck

AbstractOver the past few years, laser-induced breakdown spectroscopy (LIBS) has earned a lot of attention in the field of online polymer identification. Unlike the well-established near-infrared spectroscopy (NIR), LIBS analysis is not limited by the sample thickness or color and therefore seems to be a promising candidate for this task. Nevertheless, the similar elemental composition of most polymers results in high similarity of their LIBS spectra, which makes their discrimination challenging. To address this problem, we developed a novel chemometric strategy based on a systematic optimization of two factors influencing the discrimination ability: the set of experimental conditions (laser energy, gate delay, and atmosphere) employed for the LIBS analysis and the set of spectral variables used as a basis for the polymer discrimination. In the process, a novel concept of spectral descriptors was used to extract chemically relevant information from the polymer spectra, cluster purity based on the k-nearest neighbors (k-NN) was established as a suitable tool for monitoring the extent of cluster overlaps and an in-house designed random forest (RDF) experiment combined with a cluster purity–governed forward selection algorithm was employed to identify spectral variables with the greatest relevance for polymer identification. Using this approach, it was possible to discriminate among 20 virgin polymer types, which is the highest number reported in the literature so far. Additionally, using the optimized experimental conditions and data evaluation, robust discrimination performance could be achieved even with polymer samples containing carbon black or other common additives, which hints at an applicability of the developed approach to real-life samples. Graphical abstract


2021 ◽  
Author(s):  
Vinay S Swamy ◽  
Temesgen D Fufa ◽  
Robert B Hufnagel ◽  
David M McGaughey

The development of highly scalable single cell transcriptome technology has resulted in the creation of thousands of datasets, over 30 in the retina alone. Analyzing the transcriptomes between different projects is highly desirable as this would allow for better assessment of which biological effects are consistent across independent studies. However it is difficult to compare and contrast data across different projects as there are substantial batch effects from computational processing, single cell technology utilized, and the natural biological variation. While many single cell transcriptome specific batch correction methods purport to remove the technical noise it is difficult to ascertain which method functions works best. We developed a lightweight R package (scPOP) that brings in batch integration methods and uses a simple heuristic to balance batch merging and celltype/cluster purity. We use this package along with a Snakefile based workflow system to demonstrate how to optimally merge 766,615 cells from 34 retina datsets and three species to create a massive ocular single cell transcriptome meta-atlas. This provides a model how to efficiently create meta-atlases for tissues and cells of interest.


Author(s):  
Marleen Balvert ◽  
Xiao Luo ◽  
Ernestina Hauptfeld ◽  
Alexander Schönhuth ◽  
Bas E Dutilh

Abstract Motivation The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. Results We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. Conclusion OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Availabilityand implementation Code is made available on Github (https://github.com/Marleen1/OGRE). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Baolin Liu ◽  
Chenwei Li ◽  
Ziyi Li ◽  
Xianwen Ren ◽  
Zemin Zhang

AbstractSingle-cell RNA sequencing (scRNA-seq) is a versatile tool for discovering and annotating cell types and states, but the determination and annotation of cell subtypes is often subjective and arbitrary. Often, it is not even clear whether a given cluster is uniform. Here we present an entropy-based statistic, ROGUE, to accurately quantify the purity of identified cell clusters. We demonstrated that our ROGUE metric is generalizable across datasets, and enables accurate, sensitive and robust assessment of cluster purity on a wide range of simulated and real datasets. Applying this metric to fibroblast and B cell datasets, we identified additional subtypes and demonstrated the application of ROGUE-guided analyses to detect true signals in specific subpopulations. ROGUE can be applied to all tested scRNA-seq datasets, and has important implications for evaluating the quality of putative clusters, discovering pure cell subtypes and constructing comprehensive, detailed and standardized single cell atlas.


Author(s):  
Hao Xu ◽  
Yueru Chen ◽  
Ruiyuan Lin ◽  
C.-C. Jay Kuo

Trained features of a convolution neural network (CNN) at different convolution layers is analyzed using two quantitative metrics in this work. We first show mathematically that the Gaussian confusion measure (GCM) can be used to identify the discriminative ability of an individual feature. Next, we generalize this idea, introduce another measure called the cluster purity measure (CPM), and use it to analyze the discriminative ability of multiple features jointly. The discriminative ability of trained CNN features is validated by experimental results. Research on CNNs utilizing GCM and CPM tools offers important insights into its operational mechanism, including the behavior of trained CNN features and good detection performance of some object classes that were considered difficult in the past. Finally, the trained feature representation is compared between different CNN structures to explain the superiority of deeper networks.


Sign in / Sign up

Export Citation Format

Share Document