A new local covariance matrix estimation for the classification of gene expression profiles in RNA-Seq data

Mapping Intimacies ◽

10.1101/766402 ◽

2019 ◽

Author(s):

Necla Koçhan ◽

Gözde Yazgı Tütüncü ◽

Göknur Giner

Keyword(s):

Gene Expression ◽

Covariance Matrix ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Classification Model ◽

Data Sets ◽

Local Dependence ◽

Rna Seq ◽

Dependence Function ◽

Local Covariance

AbstractBackground and ObjectiveRecent developments in the next-generation sequencing (NGS) based on RNA-sequencing (RNA-Seq) allow researchers to measure the expression levels of thousands of genes for multiple samples simultaneously. In order to analyze these kind of data sets, many classification models have been proposed in the literature. Most of the existing classifiers assume that genes are independent; however, this is not a realistic approach for real RNA-Seq classification problems. For this reason, some other classification methods, which incorporates the dependence structure between genes into a model, are proposed. qtQDA proposed by Koçhan et al. [1] is one of those classifiers, which estimates covariance matrix by Maximum Likelihood Estimator.MethodsIn this study, we use a another approach based on local dependence function to estimate the covariance matrix to be used in the qtQDA classification model. We investigate the impact of different covariance estimates on RNA-Seq data classification.ResultsThe performances of qtQDA classifier based on two different covariance matrix estimates are compared over two real RNA-Seq data sets, in terms of classification error rates. The results show that using local dependence function approach yields a better estimate of covariance matrix and increases the performance of qtQDA classifier.ConclusionIncorporating the true/accurate covariance matrix into the classification model is an important and crucial step particularly for cancer prediction. The local covariance matrix estimate allows researchers to classify cancer patients based on gene expression profiles more accurately. R code for local dependence function is available at https://github.com/Necla/LocalDependence.

Download Full-text

A new local covariance matrix estimation for the classification of gene expression profiles in high dimensional RNA-Seq data

Expert Systems with Applications ◽

10.1016/j.eswa.2020.114200 ◽

2020 ◽

pp. 114200

Author(s):

Necla Kochan ◽

G. Yazgı Tütüncü ◽

Göknur Giner

Keyword(s):

Gene Expression ◽

Covariance Matrix ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Covariance Matrix Estimation ◽

High Dimensional ◽

Rna Seq ◽

Matrix Estimation ◽

Local Covariance

Download Full-text

Bulk and single-cell RNA-seq reveal dmrtb1 gene expression profiles during sex change in zig-zag eel (Mastacembelus armatus)

Aquaculture ◽

10.1016/j.aquaculture.2021.737194 ◽

2021 ◽

pp. 737194

Author(s):

Lingzhan Xue ◽

Dan Jia ◽

Luohao Xu ◽

Zhen Huang ◽

Haiping Fan ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sex Change ◽

Rna Seq ◽

Mastacembelus Armatus

Download Full-text

A tensor decomposition-based integrated analysis applicable to multiple gene expression profiles without sample matching

10.21203/rs.3.rs-766884/v2 ◽

2021 ◽

Author(s):

Taguchi Y-h. ◽

Turki Turki

Keyword(s):

Gene Expression ◽

Learning Strategies ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Tensor Decomposition ◽

Integrated Analysis ◽

Rna Seq ◽

Prior Learning ◽

Multiple Gene ◽

Machine Learning Applications

Abstract The integrated analysis of multiple gene expression profiles measured in distinct studies is always problematic. Especially, missing sample matching and missing common labeling between distinct studies prevent the integration of multiple studies in fully data-driven and unsupervised manner. In this study, we propose a strategy enabling the integration of multiple gene expression profiles among multiple independent studies without either labeling or sample matching, using tensor decomposition-based unsupervised feature extraction. As an example, we applied this strategy to Alzheimer’s disease (AD)-related gene expression profiles that lack exact correspondence among samples as well as AD single-cell RNA-seq (scRNA-seq) data. We found that we could select biologically reasonable genes with integrated analysis. Overall, integrated gene expression profiles can function analogously to prior learning and/or transfer learning strategies in other machine learning applications. For scRNA-seq, the proposed approach was able to drastically reduce the required computational memory.

Download Full-text

The characteristics of mesenteric adipose tissue attached to different intestinal segments and their roles in immune regulation

AJP Gastrointestinal and Liver Physiology ◽

10.1152/ajpgi.00256.2021 ◽

2022 ◽

Author(s):

Haowei Zhang ◽

Yujin Ding ◽

Qin Zeng ◽

Dandan Wang ◽

Ganglei Liu ◽

...

Keyword(s):

Gene Expression ◽

Adipose Tissue ◽

Immune Regulation ◽

Expression Profiles ◽

Critical Role ◽

Gene Expression Profiles ◽

Rna Seq ◽

Mesenteric Adipose Tissue ◽

Cell Components ◽

Subsequent Effect

Background: Mesenteric adipose tissue (MAT) plays a critical role in the intestinal physiological ecosystems. Small and large intestines have evidently intrinsic and distinct characteristics. However, whether there exist any mesenteric differences adjacent to the small and large intestines (SMAT and LMAT) has not been properly characterized. We studied the important facets of these differences, such as morphology, gene expression, cell components and immune regulation of MATs, to characterize the mesenteric differences. Methods: The SMAT and LMAT of mice were utilized for comparison of tissue morphology. Paired mesenteric samples were analyzed by RNA-seq to clarify gene expression profiles. MAT partial excision models were constructed to illustrate the immune regulation roles of MATs, and 16S-seq was applied to detect the subsequent effect on microbiota. Results: Our data show that different segments of mesenteries have different morphological structures. SMAT not only has smaller adipocytes but also contains more fat-associated lymphoid clusters than LMAT. The gene expression profile is also discrepant between these two MATs in mice. B-cell markers were abundantly expressed in SMAT, while development-related genes were highly expressed in LMAT. Adipose-derived stem cells of LMAT exhibited higher adipogenic potential and lower proliferation rates than those of SMAT. In addition, SMAT and LMAT play different roles in immune regulation and subsequently affect microbiota components. Finally, our data clarified the described differences between SMAT and LMAT in humans. Conclusions: There were significant differences in cell morphology, gene expression profiles, cell components, biological characteristics, and immune and microbiota regulation roles between regional MATs.

Download Full-text

Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1054 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-19 ◽

Cited By ~ 8

Author(s):

Minhui Paik ◽

Yuhong Yang

Keyword(s):

Gene Expression ◽

Cross Validation ◽

Nearest Neighbor ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Weighting Method ◽

Considerable Uncertainty ◽

Combined Classifier ◽

Nearest Neighbor Classifiers

Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN method. However, CV can perform poorly when there is considerable uncertainty in choosing the best candidate classifier. As an alternative to selecting a single “winner," we propose a weighting method to combine the multiple NN rules. Four gene expression data sets are used to compare its performance with CV methods. The results show that when the CV selection is unstable, the combined classifier performs much better.

Download Full-text

The effects of a globin blocker on the resolution of 3’mRNA sequencing data in porcine blood

BMC Genomics ◽

10.1186/s12864-019-6122-2 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Kyu-Sang Lim ◽

Qian Dong ◽

Pamela Moll ◽

Jana Vitkovska ◽

Gregor Wiktorin ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Globin Genes ◽

Sequencing Data ◽

Globin Mrna ◽

Data Set ◽

Mrna Sequencing ◽

Porcine Blood

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads compared to the NGB (from 56.4 to 10.1%). The second highest concentration C2, which showed very similar globin depletion rates (12%) as C1 but a better correlation of the expression of non-globin genes between NGB and GB (r = 0.98), allowed the expression of an additional 1295 non-globin genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of globin reads for NGB (n = 184) and GB (n = 189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB, similar to results from data set 1. Data set 3 (n = 84) revealed that the proportion of globin reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of globin reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-globin mRNA, the GB for QuantSeq has an advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.

Download Full-text

Comparative Transcriptome Analysis of Different Dendrobium Species Reveals Active Ingredients-Related Genes and Pathways

International Journal of Molecular Sciences ◽

10.3390/ijms21030861 ◽

2020 ◽

Vol 21 (3) ◽

pp. 861 ◽

Cited By ~ 3

Author(s):

Yingdan Yuan ◽

Bo Zhang ◽

Xinggang Tang ◽

Jinchi Zhang ◽

Jie Lin

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Rna Seq ◽

Active Ingredients ◽

Hub Genes ◽

Transcriptome Database ◽

Gene Modules ◽

Functional Investigation ◽

First Time

Dendrobium is widely used in traditional Chinese medicine, which contains many kinds of active ingredients. In recent years, many Dendrobium transcriptomes have been sequenced. Hence, weighted gene co-expression network analysis (WGCNA) was used with the gene expression profiles of active ingredients to identify the modules and genes that may associate with particular species and tissues. Three kinds of Dendrobium species and three tissues were sampled for RNA-seq to generate a high-quality, full-length transcriptome database. Based on significant changes in gene expression, we constructed co-expression networks and revealed 19 gene modules. Among them, four modules with properties correlating to active ingredients regulation and biosynthesis, and several hub genes were selected for further functional investigation. This is the first time the WGCNA method has been used to analyze Dendrobium transcriptome data. Further excavation of the gene module information will help us to further study the role and significance of key genes, key signaling pathways, and regulatory mechanisms between genes on the occurrence and development of medicinal components of Dendrobium.

Download Full-text

Interferon-regulated genetic programs and JAK/STAT pathway activate the intronic promoter of the short ACE2 isoform in renal proximal tubules

10.1101/2021.01.15.426908 ◽

2021 ◽

Author(s):

Jakub Jankowski ◽

Hye Kyung Lee ◽

Julia Wilflingseder ◽

Lothar Hennighausen

Keyword(s):

Gene Expression ◽

Proximal Tubule ◽

Expression Profiles ◽

Expression Patterns ◽

Gene Expression Profiles ◽

Regulatory Elements ◽

Rna Seq ◽

Angiotensin Converting Enzyme 2 ◽

Genome Wide ◽

Cytokine Stimulation

SummaryRecently, a short, interferon-inducible isoform of Angiotensin-Converting Enzyme 2 (ACE2), dACE2 was identified. ACE2 is a SARS-Cov-2 receptor and changes in its renal expression have been linked to several human nephropathies. These changes were never analyzed in context of dACE2, as its expression was not investigated in the kidney. We used Human Primary Proximal Tubule (HPPT) cells to show genome-wide gene expression patterns after cytokine stimulation, with emphasis on the ACE2/dACE2 locus. Putative regulatory elements controlling dACE2 expression were identified using ChIP-seq and RNA-seq. qRT-PCR differentiating between ACE2 and dACE2 revealed 300- and 600-fold upregulation of dACE2 by IFNα and IFNβ, respectively, while full length ACE2 expression was almost unchanged. JAK inhibitor ruxolitinib ablated STAT1 and dACE2 expression after interferon treatment. Finally, with RNA-seq, we identified a set of genes, largely immune-related, induced by cytokine treatment. These gene expression profiles provide new insights into cytokine response of proximal tubule cells.

Download Full-text

Comparison of Gene Expression Profiles in Nonmodel Eukaryotic Organisms with RNA-Seq

Methods in Molecular Biology - Transcriptome Data Analysis ◽

10.1007/978-1-4939-7710-9_1 ◽

2018 ◽

pp. 3-16 ◽

Cited By ~ 1

Author(s):

Han Cheng ◽

Yejun Wang ◽

Ming-an Sun

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Rna Seq ◽

Eukaryotic Organisms

Download Full-text

Transcriptome Analysis of the Gene Expression Profiles Associated with Fungal Keratitis in Mice Based on RNA-Seq

Investigative Opthalmology & Visual Science ◽

10.1167/iovs.61.6.32 ◽

2020 ◽

Vol 61 (6) ◽

pp. 32

Author(s):

Qing Zhang ◽

Jian Zhang ◽

Mengting Gong ◽

Ruolan Pan ◽

Yanchang Liu ◽

...

Keyword(s):

Gene Expression ◽

Transcriptome Analysis ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Fungal Keratitis ◽

Rna Seq

Download Full-text