scholarly journals A new local covariance matrix estimation for the classification of gene expression profiles in RNA-Seq data

2019 ◽  
Author(s):  
Necla Koçhan ◽  
Gözde Yazgı Tütüncü ◽  
Göknur Giner

AbstractBackground and ObjectiveRecent developments in the next-generation sequencing (NGS) based on RNA-sequencing (RNA-Seq) allow researchers to measure the expression levels of thousands of genes for multiple samples simultaneously. In order to analyze these kind of data sets, many classification models have been proposed in the literature. Most of the existing classifiers assume that genes are independent; however, this is not a realistic approach for real RNA-Seq classification problems. For this reason, some other classification methods, which incorporates the dependence structure between genes into a model, are proposed. qtQDA proposed by Koçhan et al. [1] is one of those classifiers, which estimates covariance matrix by Maximum Likelihood Estimator.MethodsIn this study, we use a another approach based on local dependence function to estimate the covariance matrix to be used in the qtQDA classification model. We investigate the impact of different covariance estimates on RNA-Seq data classification.ResultsThe performances of qtQDA classifier based on two different covariance matrix estimates are compared over two real RNA-Seq data sets, in terms of classification error rates. The results show that using local dependence function approach yields a better estimate of covariance matrix and increases the performance of qtQDA classifier.ConclusionIncorporating the true/accurate covariance matrix into the classification model is an important and crucial step particularly for cancer prediction. The local covariance matrix estimate allows researchers to classify cancer patients based on gene expression profiles more accurately. R code for local dependence function is available at https://github.com/Necla/LocalDependence.

2021 ◽  
Author(s):  
Taguchi Y-h. ◽  
Turki Turki

Abstract The integrated analysis of multiple gene expression profiles measured in distinct studies is always problematic. Especially, missing sample matching and missing common labeling between distinct studies prevent the integration of multiple studies in fully data-driven and unsupervised manner. In this study, we propose a strategy enabling the integration of multiple gene expression profiles among multiple independent studies without either labeling or sample matching, using tensor decomposition-based unsupervised feature extraction. As an example, we applied this strategy to Alzheimer’s disease (AD)-related gene expression profiles that lack exact correspondence among samples as well as AD single-cell RNA-seq (scRNA-seq) data. We found that we could select biologically reasonable genes with integrated analysis. Overall, integrated gene expression profiles can function analogously to prior learning and/or transfer learning strategies in other machine learning applications. For scRNA-seq, the proposed approach was able to drastically reduce the required computational memory.


Author(s):  
Haowei Zhang ◽  
Yujin Ding ◽  
Qin Zeng ◽  
Dandan Wang ◽  
Ganglei Liu ◽  
...  

Background: Mesenteric adipose tissue (MAT) plays a critical role in the intestinal physiological ecosystems. Small and large intestines have evidently intrinsic and distinct characteristics. However, whether there exist any mesenteric differences adjacent to the small and large intestines (SMAT and LMAT) has not been properly characterized. We studied the important facets of these differences, such as morphology, gene expression, cell components and immune regulation of MATs, to characterize the mesenteric differences. Methods: The SMAT and LMAT of mice were utilized for comparison of tissue morphology. Paired mesenteric samples were analyzed by RNA-seq to clarify gene expression profiles. MAT partial excision models were constructed to illustrate the immune regulation roles of MATs, and 16S-seq was applied to detect the subsequent effect on microbiota. Results: Our data show that different segments of mesenteries have different morphological structures. SMAT not only has smaller adipocytes but also contains more fat-associated lymphoid clusters than LMAT. The gene expression profile is also discrepant between these two MATs in mice. B-cell markers were abundantly expressed in SMAT, while development-related genes were highly expressed in LMAT. Adipose-derived stem cells of LMAT exhibited higher adipogenic potential and lower proliferation rates than those of SMAT. In addition, SMAT and LMAT play different roles in immune regulation and subsequently affect microbiota components. Finally, our data clarified the described differences between SMAT and LMAT in humans. Conclusions: There were significant differences in cell morphology, gene expression profiles, cell components, biological characteristics, and immune and microbiota regulation roles between regional MATs.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Minhui Paik ◽  
Yuhong Yang

Various discriminant methods have been applied for classification of tumors based on gene expression profiles, among which the nearest neighbor (NN) method has been reported to perform relatively well. Usually cross-validation (CV) is used to select the neighbor size as well as the number of variables for the NN method. However, CV can perform poorly when there is considerable uncertainty in choosing the best candidate classifier. As an alternative to selecting a single “winner," we propose a weighting method to combine the multiple NN rules. Four gene expression data sets are used to compare its performance with CV methods. The results show that when the CV selection is unstable, the combined classifier performs much better.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Kyu-Sang Lim ◽  
Qian Dong ◽  
Pamela Moll ◽  
Jana Vitkovska ◽  
Gregor Wiktorin ◽  
...  

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads compared to the NGB (from 56.4 to 10.1%). The second highest concentration C2, which showed very similar globin depletion rates (12%) as C1 but a better correlation of the expression of non-globin genes between NGB and GB (r = 0.98), allowed the expression of an additional 1295 non-globin genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of globin reads for NGB (n = 184) and GB (n = 189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB, similar to results from data set 1. Data set 3 (n = 84) revealed that the proportion of globin reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of globin reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-globin mRNA, the GB for QuantSeq has an advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.


2020 ◽  
Vol 21 (3) ◽  
pp. 861 ◽  
Author(s):  
Yingdan Yuan ◽  
Bo Zhang ◽  
Xinggang Tang ◽  
Jinchi Zhang ◽  
Jie Lin

Dendrobium is widely used in traditional Chinese medicine, which contains many kinds of active ingredients. In recent years, many Dendrobium transcriptomes have been sequenced. Hence, weighted gene co-expression network analysis (WGCNA) was used with the gene expression profiles of active ingredients to identify the modules and genes that may associate with particular species and tissues. Three kinds of Dendrobium species and three tissues were sampled for RNA-seq to generate a high-quality, full-length transcriptome database. Based on significant changes in gene expression, we constructed co-expression networks and revealed 19 gene modules. Among them, four modules with properties correlating to active ingredients regulation and biosynthesis, and several hub genes were selected for further functional investigation. This is the first time the WGCNA method has been used to analyze Dendrobium transcriptome data. Further excavation of the gene module information will help us to further study the role and significance of key genes, key signaling pathways, and regulatory mechanisms between genes on the occurrence and development of medicinal components of Dendrobium.


2021 ◽  
Author(s):  
Jakub Jankowski ◽  
Hye Kyung Lee ◽  
Julia Wilflingseder ◽  
Lothar Hennighausen

SummaryRecently, a short, interferon-inducible isoform of Angiotensin-Converting Enzyme 2 (ACE2), dACE2 was identified. ACE2 is a SARS-Cov-2 receptor and changes in its renal expression have been linked to several human nephropathies. These changes were never analyzed in context of dACE2, as its expression was not investigated in the kidney. We used Human Primary Proximal Tubule (HPPT) cells to show genome-wide gene expression patterns after cytokine stimulation, with emphasis on the ACE2/dACE2 locus. Putative regulatory elements controlling dACE2 expression were identified using ChIP-seq and RNA-seq. qRT-PCR differentiating between ACE2 and dACE2 revealed 300- and 600-fold upregulation of dACE2 by IFNα and IFNβ, respectively, while full length ACE2 expression was almost unchanged. JAK inhibitor ruxolitinib ablated STAT1 and dACE2 expression after interferon treatment. Finally, with RNA-seq, we identified a set of genes, largely immune-related, induced by cytokine treatment. These gene expression profiles provide new insights into cytokine response of proximal tubule cells.


2020 ◽  
Vol 61 (6) ◽  
pp. 32
Author(s):  
Qing Zhang ◽  
Jian Zhang ◽  
Mengting Gong ◽  
Ruolan Pan ◽  
Yanchang Liu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document