Characterizing Human Cell Types and Tissue Origin Using the Benford Law

Sne Morag; Mali Salmon-Divon

doi:10.3390/cells8091004

Characterizing Human Cell Types and Tissue Origin Using the Benford Law

Cells ◽

10.3390/cells8091004 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1004 ◽

Cited By ~ 1

Author(s):

Sne Morag ◽

Mali Salmon-Divon

Keyword(s):

Feature Selection Method ◽

Cell Types ◽

Machine Learning Algorithms ◽

Unknown Primary ◽

Rna Seq ◽

Cancer Subtypes ◽

Interdisciplinary Approaches ◽

Primary Origin ◽

The Mean ◽

Benford Law

Processing massive transcriptomic datasets in a meaningful manner requires novel, possibly interdisciplinary, approaches. One principle that can address this challenge is the Benford law (BL), which posits that the occurrence probability of a leading digit in a large numerical dataset decreases as its value increases. Here, we analyzed large single-cell and bulk RNA-seq datasets to test whether cell types and tissue origins can be differentiated based on the adherence of specific genes to the BL. Then, we used the Benford adherence scores of these genes as inputs to machine-learning algorithms and tested their separation accuracy. We found that genes selected based on their first-digit distributions can distinguish between cell types and tissue origins. Moreover, despite the simplicity of this novel feature-selection method, its separation accuracy is higher than that of the mean-expression level approach and is similar to that of the differential expression approach. Thus, the BL can be used to obtain biological insights from massive amounts of numerical genomics data—a capability that could be utilized in various biomedical applications, e.g., to resolve samples of unknown primary origin, identify possible sample contaminations, and provide insights into the molecular basis of cancer subtypes.

Download Full-text

SPINAL TUMORS AND ITS TREATMENT PERSPECTIVES IN OUR DAYS

Traumatology and Orthopedics of Russia ◽

10.21823/2311-2905-2010-0-2-126-128 ◽

2010 ◽

Vol 16 (2) ◽

pp. 126-128

Author(s):

A. K. Valiev ◽

E. R. Musaev ◽

E. A. Sushentsov ◽

K. A. Borzov ◽

M. D. Aliev

Keyword(s):

Lung Cancer ◽

Survival Rate ◽

Clinical Analysis ◽

Spinal Tumors ◽

Unknown Primary ◽

Spinal Lesions ◽

Cancer Metastases ◽

Primary Origin ◽

The Mean ◽

Median Survival Rate

The treatment of the patients with metastatic spinal lesions is one of the most difficult problems in modern vertebrology and oncology. The experience of Russian Cancer Research Center based on the clinical analysis on 214 patients with metastatic spine disease showed the results of median survival rate overall 6 months. The mean survival rate in cases of renal cancer metastases consisted 11,2 months, breast cancer - 16,2 months, lung cancer mts - 2,1 months, prostate cancer - 17,7 months, unknown primary origin - 7,8 months. We compared our results with expected survival, scored by Tokuhashi scale. The real survival rate and expected matched only in patients with renal and lung cancer mts.

Download Full-text

MarcoPolo: a clustering-free approach to the exploration of differentially expressed genes along with group information in single-cell RNA-seq data

10.1101/2020.11.23.393900 ◽

2020 ◽

Author(s):

Chanwoo Kim ◽

Hanbin Lee ◽

Juhee Jeong ◽

Keehoon Jung ◽

Buhm Han

Keyword(s):

Single Cell ◽

Differentially Expressed Genes ◽

Differential Expression Analysis ◽

Feature Selection Method ◽

Real Data ◽

Cell Types ◽

Differentially Expressed ◽

Rna Seq ◽

Sequencing Data ◽

Group Information

ABSTRACTA common approach to analyzing single-cell RNA-sequencing data is to cluster cells first and then identify differentially expressed genes based on the clustering result. However, clustering has an innate uncertainty and can be imperfect, undermining the reliability of differential expression analysis results. To overcome this challenge, we present MarcoPolo, a clustering-free approach to exploring differentially expressed genes. To find informative genes without clustering, MarcoPolo exploits the bimodality of gene expression to learn the group information of the cells with respect to the expression level directly from given data. Using simulations and real data analyses, we showed that our method puts biologically informative genes at higher ranks more accurately and robustly than other existing methods. As our method provides information on how cells can be grouped for each gene, it can help identify cell types that are not separated well in the standard clustering process. Our method can also be used as a feature selection method to improve the robustness against changes in the number of genes used in clustering.

Download Full-text

Current concepts in the diagnosis and management of neuroendocrine neoplasms of unknown primary origin

Minerva Endocrinologica ◽

10.23736/s0391-1977.19.03012-8 ◽

2020 ◽

Vol 44 (4) ◽

Cited By ~ 1

Author(s):

Krystallenia I. Alexandraki ◽

Marina Tsoli ◽

Georgios Kyriakopoulos ◽

Anna Angelousi ◽

Georgios Nikolopoulos ◽

...

Keyword(s):

Unknown Primary ◽

Neuroendocrine Neoplasms ◽

Diagnosis And Management ◽

Primary Origin ◽

Unknown Primary Origin

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text

Application of Three Deep Machine-Learning Algorithms in a Construction Assessment Model of Farmland Quality at the County Scale: Case Study of Xiangzhou, Hubei Province, China

Agriculture ◽

10.3390/agriculture11010072 ◽

2021 ◽

Vol 11 (1) ◽

pp. 72

Author(s):

Li Wang ◽

Yong Zhou ◽

Qing Li ◽

Tao Xu ◽

Zhengxiang Wu ◽

...

Keyword(s):

Quality Assessment ◽

Index System ◽

Quality Index ◽

Hubei Province ◽

Machine Learning Algorithms ◽

Assessment Model ◽

Soil Conditions ◽

The Mean ◽

County Scale ◽

Assessment Results

Constructing a scientific and quantitative quality-assessment model for farmland is important for understanding farmland quality, and can provide a theoretical basis and technical support for formulating rational and effective management policies and realizing the sustainable use of farmland resources. To more accurately reflect the systematic, complex, and differential characteristics of farmland quality, this study aimed to explore an intelligent farmland quality-assessment method that avoids the subjectivity of determining indicator weights while improving assessment accuracy. Taking Xiangzhou in Hubei Province, China, as the study area, 14 indicators were selected from four dimensions—terrain, soil conditions, socioeconomics, and ecological environment—to build a comprehensive assessment index system for farmland quality applicable to the region. A total of 1590 representative samples in Xiangzhou were selected, of which 1110 were used as training samples, 320 as test samples, and 160 as validation samples. Three models of entropy weight (EW), backpropagation neural network (BPNN), and random forest (RF) were selected for training, and the assessment results of farmland quality were output through simulations to compare their assessment accuracy and analyze the distribution pattern of farmland quality grades in Xiangzhou in 2018. The results showed the following: (1) The RF model for farmland quality assessment required fewer parameters, and could simulate the complex relationships between indicators more accurately and analyze each indicator’s contribution to farmland quality scientifically. (2) In terms of the average quality index of farmland, RF > BPNN > EW. The spatial patterns of the quality index from RF and BPNN were similar, and both were significantly different from EW. (3) In terms of the assessment results and precision characterization indicators, the assessment results of RF were more in line with realities of natural and socioeconomic development, with higher applicability and reliability. (4) Compared to BPNN and EW, RF had a higher data mining ability and training accuracy, and its assessment result was the best. The coefficient of determination (R2) was 0.8145, the mean absolute error (MAE) was 0.009, and the mean squared error (MSE) was 0.012. (5) The overall quality of farmland in Xiangzhou was higher, with a larger area of second- and third-grade farmland, accounting for 54.63%, and the grade basically conformed to the trend of positive distribution, showing an obvious pattern of geographical distribution, with overall high performance in the north-central part and low in the south. The distribution of farmland quality grades also varied widely among regions. This showed that RF was more suitable for the quality assessment of farmland with complex nonlinear characteristics. This study enriches and improves the index system and methodological research of farmland quality assessment at the county scale, and provides a basis for achieving a threefold production pattern of farmland quantity, quality, and ecology in Xiangzhou, while also serving as a reference for similar regions and countries.

Download Full-text

Transcriptional and morphological profiling of parvalbumin interneuron subpopulations in the mouse hippocampus

Nature Communications ◽

10.1038/s41467-020-20328-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Lin Que ◽

David Lukacsovich ◽

Wenshu Luo ◽

Csaba Földy

Keyword(s):

Large Scale ◽

Cell Types ◽

Rna Seq ◽

Neuronal Identity ◽

Parvalbumin Interneurons ◽

Different Types ◽

Parvalbumin Interneuron ◽

Cam Profile ◽

Developmental Domains

AbstractThe diversity reflected by >100 different neural cell types fundamentally contributes to brain function and a central idea is that neuronal identity can be inferred from genetic information. Recent large-scale transcriptomic assays seem to confirm this hypothesis, but a lack of morphological information has limited the identification of several known cell types. In this study, we used single-cell RNA-seq in morphologically identified parvalbumin interneurons (PV-INs), and studied their transcriptomic states in the morphological, physiological, and developmental domains. Overall, we find high transcriptomic similarity among PV-INs, with few genes showing divergent expression between morphologically different types. Furthermore, PV-INs show a uniform synaptic cell adhesion molecule (CAM) profile, suggesting that CAM expression in mature PV cells does not reflect wiring specificity after development. Together, our results suggest that while PV-INs differ in anatomy and in vivo activity, their continuous transcriptomic and homogenous biophysical landscapes are not predictive of these distinct identities.

Download Full-text

Potential of spectroscopic analyses for non-destructive estimation of tea quality-related metabolites in fresh new leaves

Scientific Reports ◽

10.1038/s41598-021-83847-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hiroto Yamashita ◽

Rei Sonobe ◽

Yuhei Hirono ◽

Akio Morita ◽

Takashi Ikka

Keyword(s):

Short Wave ◽

Machine Learning Algorithms ◽

Estimation Methods ◽

Chemical Information ◽

Tea Quality ◽

The Mean ◽

Processing Techniques ◽

Spectral Patterns ◽

Physical And Chemical ◽

Non Destructive

AbstractSpectroscopic sensing provides physical and chemical information in a non-destructive and rapid manner. To develop non-destructive estimation methods of tea quality-related metabolites in fresh leaves, we estimated the contents of free amino acids, catechins, and caffeine in fresh tea leaves using visible to short-wave infrared hyperspectral reflectance data and machine learning algorithms. We acquired these data from approximately 200 new leaves with various status and then constructed the regression model in the combination of six spectral patterns with pre-processing and five algorithms. In most phenotypes, the combination of de-trending pre-processing and Cubist algorithms was robustly selected as the best combination in each round over 100 repetitions that were evaluated based on the ratio of performance to deviation (RPD) values. The mean RPD values were ranged from 1.1 to 2.7 and most of them were above the acceptable or accurate threshold (RPD = 1.4 or 2.0, respectively). Data-based sensitivity analysis identified the important hyperspectral regions around 1500 and 2000 nm. Present spectroscopic approaches indicate that most tea quality-related metabolites can be estimated non-destructively, and pre-processing techniques help to improve its accuracy.

Download Full-text

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

Genome Biology ◽

10.1186/s13059-021-02368-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ruizhu Huang ◽

Charlotte Soneson ◽

Pierre-Luc Germain ◽

Thomas S.B. Schmidt ◽

Christian Von Mering ◽

...

Keyword(s):

Single Cell ◽

Synthetic Data ◽

Cell Types ◽

Data Driven ◽

Rna Seq ◽

Hierarchical Trees

AbstracttreeclimbR is for analyzing hierarchical trees of entities, such as phylogenies or cell types, at different resolutions. It proposes multiple candidates that capture the latent signal and pinpoints branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single-cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.

Download Full-text

Transcriptomic Changes of Murine Visceral Fat Exposed to Intermittent Hypoxia at Single Cell Resolution

International Journal of Molecular Sciences ◽

10.3390/ijms22010261 ◽

2020 ◽

Vol 22 (1) ◽

pp. 261

Author(s):

Abdelnaby Khalyfa ◽

Wesley Warren ◽

Jorge Andrade ◽

Christopher A. Bottoms ◽

Edward S. Rice ◽

...

Keyword(s):

Intermittent Hypoxia ◽

Cell Types ◽

Cellular Level ◽

Cellular Heterogeneity ◽

Transcriptional Networks ◽

Metabolic Dysfunction ◽

Rna Seq ◽

Obstructive Sleep ◽

Differential Expressed Gene ◽

Key Aspects

Intermittent hypoxia (IH) is a hallmark of obstructive sleep apnea (OSA) and induces metabolic dysfunction manifesting as inflammation, increased lipolysis and insulin resistance in visceral white adipose tissues (vWAT). However, the cell types and their corresponding transcriptional pathways underlying these functional perturbations are unknown. Here, we applied single nucleus RNA sequencing (snRNA-seq) coupled with aggregate RNA-seq methods to evaluate the cellular heterogeneity in vWAT following IH exposures mimicking OSA. C57BL/6 male mice were exposed to IH and room air (RA) for 6 weeks, and nuclei from vWAT were isolated and processed for snRNA-seq followed by differential expressed gene (DEGs) analyses by cell type, along with gene ontology and canonical pathways enrichment tests of significance. IH induced significant transcriptional changes compared to RA across 14 different cell types identified in vWAT. We identified cell-specific signature markers, transcriptional networks, metabolic signaling pathways, and cellular subpopulation enrichment in vWAT. Globally, we also identify 298 common regulated genes across multiple cellular types that are associated with metabolic pathways. Deconvolution of cell types in vWAT using global RNA-seq revealed that distinct adipocytes appear to be differentially implicated in key aspects of metabolic dysfunction. Thus, the heterogeneity of vWAT and its response to IH at the cellular level provides important insights into the metabolic morbidity of OSA and may possibly translate into therapeutic targets.

Download Full-text

Adenocarcinoma of Unknown Primary Origin

Southern Medical Journal ◽

10.1097/00007611-198112000-00003 ◽

1981 ◽

Vol 71 (12) ◽

pp. 1431-1434 ◽

Cited By ~ 6

Author(s):

S. RAJU INDUPALLI ◽

AGOP Y. BEDIKIAN ◽

GERALD P. BODEY

Keyword(s):

Unknown Primary ◽

Primary Origin ◽

Unknown Primary Origin

Download Full-text