lncDIFF: a novel distribution-free method for differential expression analysis of long non-coding RNA

Mapping Intimacies ◽

10.1101/420562 ◽

2018 ◽

Author(s):

Qian Li ◽

Xiaoqing Yu ◽

Ritu Chaudhary ◽

Robbert JC Slebos ◽

Christine H. Chung ◽

...

Keyword(s):

Differential Expression Analysis ◽

R Package ◽

Supplementary Information ◽

Ratio Test ◽

Differential Analysis ◽

Non Coding Rna ◽

Robust Statistical Method ◽

Lower False Discovery Rate ◽

Differential Expressed Genes ◽

Long Non Coding Rna

ABSTRACTMotivationLong non-coding RNA expression data has been increasingly used in finding diagnostic and prognostic biomarkers in cancer studies. Existing differential analysis tools for RNA sequencing does not effectively accommodate low abundant genes, as commonly observed in lncRNA. We propose a novel and robust statistical method lncDIFF to detect differential expressed (DE) genes without assuming the true density on normalized counts.ResultslncDIFF adopts the generalized linear model with zero-inflated exponential quasi likelihood to estimate group effect on normalized counts, and employs the likelihood ratio test to detect differential expressed genes. The proposed method and tool is suitable for data processed with standard RNA-Seq preprocessing and normalization pipelines. Simulation results illustrate that lncDIFF detects DE genes with more power and lower false discovery rate regardless of the data pattern. The analysis on a head and neck squamous cell carcinomas study also confirms that lncDIFF has better sensitivity in identifying novel lncRNA genes with relatively large fold change and prognostic value.Availability and ImplementationlncDIFF is an R package available athttps://github.com/qianli10000/lncDIFF.Supplementary InformationSupplementary Data are available at Bioinformatics online.

Download Full-text

Identification of Long Non-coding RNA Isolated From Naturally Infected Macrophages and Associated With Bovine Johne's Disease in Canadian Holstein Using a Combination of Neural Networks and Logistic Regression

Frontiers in Veterinary Science ◽

10.3389/fvets.2021.639053 ◽

2021 ◽

Vol 8 ◽

Author(s):

Andrew Marete ◽

Olivier Ariel ◽

Eveline Ibeagha-Awemu ◽

Nathalie Bissonnette

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Differential Expression Analysis ◽

Johne's Disease ◽

Classification Systems ◽

Johne’S Disease ◽

Potential Candidate ◽

Rna Seq ◽

Non Coding Rna ◽

Long Non Coding Rna

Mycobacterium avium ssp. paratuberculosis (MAP) causes chronic enteritis in most ruminants. The pathogen MAP causes Johne's disease (JD), a chronic, incurable, wasting disease. Weight loss, diarrhea, and a gradual drop in milk production characterize the disease's clinical phase, culminating in death. Several studies have characterized long non-coding RNA (lncRNA) in bovine tissues, and a previous study characterizes (lncRNA) in macrophages infected with MAP in vitro. In this study, we aim to characterize the lncRNA in macrophages from cows naturally infected with MAP. From 15 herds, feces and blood samples were collected for each cow older than 24 months, twice yearly over 3–5 years. Paired samples were analyzed by fecal PCR and blood ELISA. We used RNA-seq data to study lncRNA in macrophages from 33 JD(+) and 33 JD(–) dairy cows. We performed RNA-seq analysis using the “new Tuxedo” suite. We characterized lncRNA using logistic regression and multilayered neural networks and used DESeq2 for differential expression analysis and Panther and Reactome classification systems for gene ontology (GO) analysis. The study identified 13,301 lncRNA, 605 of which were novel lncRNA. We found seven genes close to differentially expressed lncRNA, including CCDC174, ERI1, FZD1, TWSG1, ZBTB38, ZNF814, and ZSCAN4. None of the genes associated with susceptibility to JD have been cited in the literature. LncRNA target genes were significantly enriched for biological process GO terms involved in immunity and nucleic acid regulation. These include the MyD88 pathway (TLR5), GO:0043312 (neutrophil degranulation), GO:0002446 (neutrophil-mediated immunity), and GO:0042119 (neutrophil activation). These results identified lncRNA with potential roles in host immunity and potential candidate genes and pathways through which lncRNA might function in response to MAP infection.

Download Full-text

scBatch: batch-effect correction of RNA-seq data through sample distance matrix adjustment

Bioinformatics ◽

10.1093/bioinformatics/btaa097 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3115-3123 ◽

Cited By ~ 3

Author(s):

Teng Fei ◽

Tianwei Yu

Keyword(s):

Single Cell ◽

Differential Expression Analysis ◽

Distance Matrix ◽

Real Data ◽

R Package ◽

Batch Effect ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Gene Differential Expression

Abstract Motivation Batch effect is a frequent challenge in deep sequencing data analysis that can lead to misleading conclusions. Existing methods do not correct batch effects satisfactorily, especially with single-cell RNA sequencing (RNA-seq) data. Results We present scBatch, a numerical algorithm for batch-effect correction on bulk and single-cell RNA-seq data with emphasis on improving both clustering and gene differential expression analysis. scBatch is not restricted by assumptions on the mechanism of batch-effect generation. As shown in simulations and real data analyses, scBatch outperforms benchmark batch-effect correction methods. Availability and implementation The R package is available at github.com/tengfei-emory/scBatch. The code to generate results and figures in this article is available at github.com/tengfei-emory/scBatch-paper-scripts. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DEUS: an R package for accurate small RNA profiling based on differential expression of unique sequences

Bioinformatics ◽

10.1093/bioinformatics/btz495 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4834-4836

Author(s):

Tim Jeske ◽

Peter Huypens ◽

Laura Stirm ◽

Selina Höckele ◽

Christine M Wurmser ◽

...

Keyword(s):

Differential Expression ◽

Small Rna ◽

Sequence Similarity ◽

Differential Expression Analysis ◽

R Package ◽

Supplementary Information ◽

Small Rna Sequencing ◽

Sequencing Data ◽

Rna Sequences ◽

Rna Profiling

Abstract Summary Despite their fundamental role in various biological processes, the analysis of small RNA sequencing data remains a challenging task. Major obstacles arise when short RNA sequences map to multiple locations in the genome, align to regions that are not annotated or underwent post-transcriptional changes which hamper accurate mapping. In order to tackle these issues, we present a novel profiling strategy that circumvents the need for read mapping to a reference genome by utilizing the actual read sequences to determine expression intensities. After differential expression analysis of individual sequence counts, significant sequences are annotated against user defined feature databases and clustered by sequence similarity. This strategy enables a more comprehensive and concise representation of small RNA populations without any data loss or data distortion. Availability and implementation Code and documentation of our R package at http://ibis.helmholtz-muenchen.de/deus/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CROSSalive: a web server for predicting the in vivo structure of RNA molecules

Bioinformatics ◽

10.1093/bioinformatics/btz666 ◽

2019 ◽

Author(s):

Riccardo Delli Ponti ◽

Alexandros Armaos ◽

Andrea Vandelli ◽

Gian Gaetano Tartaglia

Keyword(s):

Protein Interactions ◽

Rna Structure ◽

Cross Validation ◽

Supplementary Information ◽

Supplementary Data ◽

High Confidence ◽

Rna Molecules ◽

Non Coding Rna ◽

Long Non Coding Rna

Abstract Motivation RNA structure is difficult to predict in vivo due to interactions with enzymes and other molecules. Here we introduce CROSSalive, an algorithm to predict the single- and double-stranded regions of RNAs in vivo using predictions of protein interactions. Results Trained on icSHAPE data in presence (m6a+) and absence of N6 methyladenosine modification (m6a-), CROSSalive achieves cross-validation accuracies between 0.70 and 0.88 in identifying high-confidence single- and double-stranded regions. The algorithm was applied to the long non-coding RNA Xist (17 900 nt, not present in the training) and shows an Area under the ROC curve of 0.83 in predicting structured regions. Availability and implementation CROSSalive webserver is freely accessible at http://service.tartaglialab.com/new_submission/crossalive Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome-wide detection and sequence conservation analysis of long non-coding RNA during hair follicle cycle of yak

10.21203/rs.3.rs-17011/v2 ◽

2020 ◽

Author(s):

Xiaolan Zhang ◽

Qi Bao ◽

Congjun Jia ◽

Chen Li ◽

Yongfang Chang ◽

...

Keyword(s):

Hair Follicle ◽

Signaling Pathway ◽

Expression Profile ◽

Differential Expression Analysis ◽

Hair Follicles ◽

Sequence Conservation ◽

Cashmere Goat ◽

Non Coding Rna ◽

Long Non Coding Rna ◽

Ncbi Blast

Abstract Background: Long non-coding RNA (lncRNA) as an important regulator has been demonstrated playing an indispensable role in the biological process of hair follicles (HFs) growth. However, their function and expression profile in the HFs cycle of yak are yet unknown. Only a few functional lncRNAs have been identified, partly due to the low sequence conservation and lack of identified conserved properties in lncRNAs. Here, lncRNA-seq was employed to detect the expression profile of lncRNAs during the HFs cycle of yak, and the sequence conservation of two datasets between yak and cashmere goat during the HFs cycle was analyzed. Results: A total of 2884 lncRNAs were identified in 5 phases (Jan., Mar., Jun., Aug., and Oct.) during the HFs cycle of yak. Then, differential expression analysis between 3 phases (Jan., Mar., and Oct.) was performed, revealing that 198 differentially expressed lncRNAs (DELs) were obtained in the Oct.-vs-Jan. group, 280 DELs were obtained in the Jan.-vs-Mar. group, and 340 DELs were obtained in Mar.-vs-Oct. group. Subsequently, the nearest genes of lncRNAs were searched as the potential target genes and used to explore the function of DELs by GO and KEGG enrichment analysis. Several critical pathways involved in HFs development such as Wnt signaling pathway, VEGF signaling pathway, and Signaling pathways regulating pluripotency of stem cells, were enriched. To further screen key lncRNAs influencing the HFs cycle, 24 DELs with differ degree of sequence conservation were obtained via a comparative analysis of partial DELs with previously published lncRNA-seq data of cashmere goat in the HFs cycle using NCBI BLAST-2.9.0+, and 3 DELs of them were randomly selected for further detailed analysis of the sequence conservation properties. Conclusions: This study revealed the expression pattern and potential function of lncRNAs during HFs cycle of yak, which would expand the knowledge about the role of lncRNAs in the HFs cycle. The findings related to sequence conservation properties of lncRNAs in the HFs cycle between the two species may provide valuable insights into the study of lncRNA functionality and mechanism. Keywords: Hair follicle cycling, lncRNA, NCBI blast-2.9.0+, Yak

Download Full-text

scRMD: imputation for single cell RNA-seq data via robust matrix decomposition

Bioinformatics ◽

10.1093/bioinformatics/btaa139 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3156-3161 ◽

Cited By ~ 9

Author(s):

Chong Chen ◽

Changjing Wu ◽

Linjie Wu ◽

Xiaochen Wang ◽

Minghua Deng ◽

...

Keyword(s):

Data Analysis ◽

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Matrix Decomposition ◽

Transcriptome Profiling ◽

R Package ◽

Supplementary Information ◽

Downstream Analysis

Abstract Motivation Single cell RNA-sequencing (scRNA-seq) technology enables whole transcriptome profiling at single cell resolution and holds great promises in many biological and medical applications. Nevertheless, scRNA-seq often fails to capture expressed genes, leading to the prominent dropout problem. These dropouts cause many problems in down-stream analysis, such as significant increase of noises, power loss in differential expression analysis and obscuring of gene-to-gene or cell-to-cell relationship. Imputation of these dropout values can be beneficial in scRNA-seq data analysis. Results In this article, we model the dropout imputation problem as robust matrix decomposition. This model has minimal assumptions and allows us to develop a computational efficient imputation method called scRMD. Extensive data analysis shows that scRMD can accurately recover the dropout values and help to improve downstream analysis such as differential expression analysis and clustering analysis. Availability and implementation The R package scRMD is available at https://github.com/XiDsLab/scRMD. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome-wide detection and sequence conservation analysis of long non-coding RNA during hair follicle cycle of yak

BMC Genomics ◽

10.1186/s12864-020-07082-z ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Xiaolan Zhang ◽

Qi Bao ◽

Congjun Jia ◽

Chen Li ◽

Yongfang Chang ◽

...

Keyword(s):

Signaling Pathway ◽

Expression Profile ◽

Target Genes ◽

Differential Expression Analysis ◽

Wnt Signaling Pathway ◽

Hair Follicles ◽

Sequence Conservation ◽

Cashmere Goat ◽

Non Coding Rna ◽

Long Non Coding Rna

Abstract Background Long non-coding RNA (lncRNA) as an important regulator has been demonstrated playing an indispensable role in the biological process of hair follicles (HFs) growth. However, their function and expression profile in the HFs cycle of yak are yet unknown. Only a few functional lncRNAs have been identified, partly due to the low sequence conservation and lack of identified conserved properties in lncRNAs. Here, lncRNA-seq was employed to detect the expression profile of lncRNAs during the HFs cycle of yak, and the sequence conservation of two datasets between yak and cashmere goat during the HFs cycle was analyzed. Results A total of 2884 lncRNAs were identified in 5 phases (Jan., Mar., Jun., Aug., and Oct.) during the HFs cycle of yak. Then, differential expression analysis between 3 phases (Jan., Mar., and Oct.) was performed, revealing that 198 differentially expressed lncRNAs (DELs) were obtained in the Oct.-vs-Jan. group, 280 DELs were obtained in the Jan.-vs-Mar. group, and 340 DELs were obtained in the Mar.-vs-Oct. group. Subsequently, the nearest genes of lncRNAs were searched as the potential target genes and used to explore the function of DELs by GO and KEGG enrichment analysis. Several critical pathways involved in HFs development such as Wnt signaling pathway, VEGF signaling pathway, and signaling pathways regulating pluripotency of stem cells, were enriched. To further screen key lncRNAs influencing the HFs cycle, 24 DELs with differ degree of sequence conservation were obtained via a comparative analysis of partial DELs with previously published lncRNA-seq data of cashmere goat in the HFs cycle using NCBI BLAST-2.9.0+, and 3 DELs of them were randomly selected for further detailed analysis of the sequence conservation properties. Conclusions This study revealed the expression pattern and potential function of lncRNAs during HFs cycle of yak, which would expand the knowledge about the role of lncRNAs in the HFs cycle. The findings related to sequence conservation properties of lncRNAs in the HFs cycle between the two species may provide valuable insights into the study of lncRNA functionality and mechanism.

Download Full-text

A long non-coding RNA signature predicts survival for glioblastoma as prognostic biomarkers

10.21203/rs.3.rs-21930/v1 ◽

2020 ◽

Author(s):

Zhenzhe Li ◽

Zhonghua Lv ◽

Lei Yu ◽

Sibin Zhang ◽

Yingjie Wang ◽

...

Keyword(s):

Noncoding Rna ◽

Cox Regression ◽

The Cancer Genome Atlas ◽

Differential Analysis ◽

Cox Regression Analysis ◽

Non Coding Rna ◽

Cancer Genome Atlas ◽

The Central Nervous System ◽

Long Non Coding Rna

Abstract Background: Glioblastoma (GBM) is one of the most fatal tumors in the central nervous system. Its prognosis is very poor. There is increasing evidence that long noncoding RNA (lncRNA) participates in the biological process of glioblastoma. Nevertheless, the role of lncRNA in predicting the prognosis of GBM is still uncertain. Methods: In this study, using RNA-Seq and clinical follow-up data of GBM patients from The Cancer Genome Atlas (TCGA), we performed differential analysis of lncRNA, univariable and multivariable Cox regression analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, and Gene Ontology (GO) analysis.Results: We identified four lncRNAs closely interrelated with survival and prognosis of GBM patients. This lncRNA signature was effective in both the training set and the testing set, and it was independent to clinical factors.Conclusions: Our data suggested that the four lncRNAs could be used as promising biomarkers for predicting prognosis in GBM patients.

Download Full-text

DECO: decompose heterogeneous population cohorts for patient stratification and discovery of sample biomarkers using omic data profiling

Bioinformatics ◽

10.1093/bioinformatics/btz148 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3651-3662 ◽

Cited By ~ 1

Author(s):

F J Campos-Laborie ◽

A Risueño ◽

M Ortiz-Estévez ◽

B Rosón-Burgo ◽

C Droste ◽

...

Keyword(s):

Correspondence Analysis ◽

Large Scale ◽

Simulated Data ◽

R Package ◽

Heterogeneous Data ◽

Supplementary Information ◽

Patient Stratification ◽

Differential Analysis ◽

Data Profiling ◽

Omic Data

Abstract Motivation Patient and sample diversity is one of the main challenges when dealing with clinical cohorts in biomedical genomics studies. During last decade, several methods have been developed to identify biomarkers assigned to specific individuals or subtypes of samples. However, current methods still fail to discover markers in complex scenarios where heterogeneity or hidden phenotypical factors are present. Here, we propose a method to analyze and understand heterogeneous data avoiding classical normalization approaches of reducing or removing variation. Results DEcomposing heterogeneous Cohorts using Omic data profiling (DECO) is a method to find significant association among biological features (biomarkers) and samples (individuals) analyzing large-scale omic data. The method identifies and categorizes biomarkers of specific phenotypic conditions based on a recurrent differential analysis integrated with a non-symmetrical correspondence analysis. DECO integrates both omic data dispersion and predictor–response relationship from non-symmetrical correspondence analysis in a unique statistic (called h-statistic), allowing the identification of closely related sample categories within complex cohorts. The performance is demonstrated using simulated data and five experimental transcriptomic datasets, and comparing to seven other methods. We show DECO greatly enhances the discovery and subtle identification of biomarkers, making it especially suited for deep and accurate patient stratification. Availability and implementation DECO is freely available as an R package (including a practical vignette) at Bioconductor repository (http://bioconductor.org/packages/deco/). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Testing clonal relatedness of two tumors from the same patient based on their mutational profiles: update of the Clonality R package

Bioinformatics ◽

10.1093/bioinformatics/btz486 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4776-4778

Author(s):

Audrey Mauguen ◽

Venkatraman E Seshan ◽

Colin B Begg ◽

Irina Ostrovnaya

Keyword(s):

Frequency Estimation ◽

R Package ◽

Supplementary Information ◽

Ratio Test ◽

New Methods ◽

External Data ◽

Mutational Frequency ◽

Clonal Relatedness ◽

Practical Tool ◽

Generation Sequencing

Abstract Summary The Clonality R package is a practical tool to assess the clonal relatedness of two tumors from the same patient. We have previously presented its functionality for testing tumors using loss of heterozygosity data or copy number arrays. Since then somatic mutation data have been more widely available through next generation sequencing and we have developed new methodology for comparing the tumors’ mutational profiles. We thus extended the package to include these two new methods for comparing tumors as well as the mutational frequency estimation from external data required for their implementation. The first method is a likelihood ratio test that is readily available on a patient by patient basis. The second method employs a random-effects model to estimate both the population and individual probabilities of clonal relatedness from a group of patients with pairs of tumors. The package is available on Bioconductor. Availability and implementation Bioconductor (http://bioconductor.org/packages/release/bioc/html/Clonality.html). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text