The fractured landscape of RNA-seq alignment: The default in our STARs

Mapping Intimacies ◽

10.1101/220681 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sara Ballouz ◽

Alexander Dobin ◽

Thomas Gingeras ◽

Jesse Gillis

Keyword(s):

Expression Profile ◽

Rna Seq ◽

Model Data ◽

Mhc Genes ◽

Wide Range ◽

Biological Discovery ◽

Biological Performance ◽

Expression Quantification

ABSTRACTMany tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods’ good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR’s performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.

Download Full-text

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model

Genes ◽

10.3390/genes12020311 ◽

2021 ◽

Vol 12 (2) ◽

pp. 311

Author(s):

Zhenqiu Liu

Keyword(s):

Single Cell ◽

Free Parameter ◽

Graphical Model ◽

Expression Patterns ◽

Information Criterion ◽

Log P ◽

Rna Seq ◽

Clustering Methods ◽

Wide Range ◽

Free Parameters

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text

A Transcriptomic Approach to the Metabolism of Tetrapyrrolic Photosensitizers in a Marine Annelid

Molecules ◽

10.3390/molecules26133924 ◽

2021 ◽

Vol 26 (13) ◽

pp. 3924

Author(s):

Maria Leonor Santos ◽

Mariaelena D’Ambrosio ◽

Ana P. Rodrigo ◽

A. Jorge Parola ◽

Pedro M. Costa

Keyword(s):

Metabolic Pathways ◽

Molecular Networks ◽

Bile Pigments ◽

Rna Seq ◽

The Past ◽

Wide Range ◽

Genes Encoding ◽

Organ Specific ◽

Bright Green ◽

Green Coloration

The past decade has seen growing interest in marine natural pigments for biotechnological applications. One of the most abundant classes of biological pigments is the tetrapyrroles, which are prized targets due their photodynamic properties; porphyrins are the best known examples of this group. Many animal porphyrinoids and other tetrapyrroles are produced through heme metabolic pathways, the best known of which are the bile pigments biliverdin and bilirubin. Eulalia is a marine Polychaeta characterized by its bright green coloration resulting from a remarkably wide range of greenish and yellowish tetrapyrroles, some of which have promising photodynamic properties. The present study combined metabolomics based on HPLC-DAD with RNA-seq transcriptomics to investigate the molecular pathways of porphyrinoid metabolism by comparing the worm’s proboscis and epidermis, which display distinct pigmentation patterns. The results showed that pigments are endogenous and seemingly heme-derived. The worm possesses homologs in both organs for genes encoding enzymes involved in heme metabolism such as ALAD, FECH, UROS, and PPOX. However, the findings also indicate that variants of the canonical enzymes of the heme biosynthesis pathway can be species- and organ-specific. These differences between molecular networks contribute to explain not only the differential pigmentation patterns between organs, but also the worm’s variety of novel endogenous tetrapyrrolic compounds.

Download Full-text

RNA-Seq Data-Mining Allows the Discovery of Two Long Non-Coding RNA Biomarkers of Viral Infection in Humans

International Journal of Molecular Sciences ◽

10.3390/ijms21082748 ◽

2020 ◽

Vol 21 (8) ◽

pp. 2748 ◽

Cited By ~ 1

Author(s):

Ruth Barral-Arca ◽

Alberto Gómez-Carballa ◽

Miriam Cebey-López ◽

María José Currás-Tuala ◽

Sara Pischedda ◽

...

Keyword(s):

Gene Expression ◽

Viral Infections ◽

Umbilical Vein ◽

Cell Types ◽

Dermal Fibroblasts ◽

Learning Approaches ◽

Rna Seq ◽

Wide Range ◽

Healthy Control ◽

Umbilical Vein Endothelial Cells

There is a growing interest in unraveling gene expression mechanisms leading to viral host invasion and infection progression. Current findings reveal that long non-coding RNAs (lncRNAs) are implicated in the regulation of the immune system by influencing gene expression through a wide range of mechanisms. By mining whole-transcriptome shotgun sequencing (RNA-seq) data using machine learning approaches, we detected two lncRNAs (ENSG00000254680 and ENSG00000273149) that are downregulated in a wide range of viral infections and different cell types, including blood monocluclear cells, umbilical vein endothelial cells, and dermal fibroblasts. The efficiency of these two lncRNAs was positively validated in different viral phenotypic scenarios. These two lncRNAs showed a strong downregulation in virus-infected patients when compared to healthy control transcriptomes, indicating that these biomarkers are promising targets for infection diagnosis. To the best of our knowledge, this is the very first study using host lncRNAs biomarkers for the diagnosis of human viral infections.

Download Full-text

Gene Expression Imputation with Generative Adversarial Imputation Nets

10.1101/2020.06.09.141689 ◽

2020 ◽

Author(s):

Ramon Viñas ◽

Tiago Azevedo ◽

Eric R. Gamazon ◽

Pietro Liò

Keyword(s):

Gene Expression ◽

Large Scale ◽

Biological Significance ◽

Predictive Performance ◽

Cost Effective ◽

Rna Seq ◽

Comprehensive Collection ◽

Genomic Studies ◽

Biological Discovery ◽

Cancer Types

AbstractA question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.

Download Full-text

Tissue-specific cis-regulatory divergence implicates a fatty acid elongase necessary for inhibiting interspecies mating in Drosophila

10.1101/344754 ◽

2018 ◽

Author(s):

Peter A. Combs ◽

Joshua J. Krupp ◽

Neil M. Khosla ◽

Dennis Bua ◽

Dmitri A. Petrov ◽

...

Keyword(s):

Fatty Acid ◽

Candidate Genes ◽

Molecular Mechanisms ◽

Sister Species ◽

F1 Hybrids ◽

Rna Seq ◽

Fatty Acid Elongase ◽

Tissue Specific ◽

Regulatory Changes ◽

Wide Range

AbstractPheromones known as cuticular hydrocarbons are a major component of reproductive isolation in Drosophila. Individuals from morphologically similar sister species produce different sets of hydrocarbons that allow potential mates to identify them as a suitable partner. In order to explore the molecular mechanisms underlying speciation, we performed RNA-seq in F1 hybrids to measure tissue-specific cis-regulatory divergence between the sister species D. simulans and D. sechellia. By focusing on cis-regulatory changes specific to female oenocytes, we rapidly identified a small number of candidate genes. We found that one of these, the fatty acid elongase eloF, broadly affects both the complement of hydrocarbons present on D. sechellia females and the propensity of D. simulans males to mate with those females. In addition, knockdown of eloF in the more distantly related D. melanogaster led to a similar shift in hydrocarbons as well as lower interspecific mate discrimination by D. simulans males. Thus, cis-regulatory changes in eloF appear to be a major driver in the sexual isolation of D. simulans from multiple other species. More generally, our RNA-seq approach proved to be far more efficient than QTL mapping in identifying candidate genes; the same framework can be used to pinpoint cis-regulatory drivers of divergence in a wide range of traits differing between any interfertile species.

Download Full-text

Comparison and evaluation of statistical error models for scRNA-seq

10.1101/2021.07.07.451498 ◽

2021 ◽

Author(s):

Saket Choudhary ◽

Rahul Satija

Keyword(s):

Linear Models ◽

Negative Binomial ◽

Statistical Error ◽

Rna Seq ◽

Multiple Sources ◽

Error Models ◽

Wide Range ◽

Data Driven Approach ◽

Downstream Analysis ◽

Experimental Processing

Heterogeneity in single-cell RNA-seq (scRNA-seq) data is driven by multiple sources, including biological variation in cellular state as well as technical variation introduced during experimental processing. Deconvolving these effects is a key challenge for preprocessing workflows. Recent work has demonstrated the importance and utility of count models for scRNA-seq analysis, but there is a lack of consensus on which statistical distributions and parameter settings are appropriate. Here, we analyze 58 scRNA-seq datasets that span a wide range of technologies, systems, and sequencing depths in order to evaluate the performance of different error models. We find that while a Poisson error model appears appropriate for sparse datasets, we observe clear evidence of overdispersion for genes with sufficient sequencing depth in all biological systems, necessitating the use of a negative binomial model. Moreover, we find that the degree of overdispersion varies widely across datasets, systems, and gene abundances, and argues for a data-driven approach for parameter estimation. Based on these analyses, we provide a set of recommendations for modeling variation in scRNA-seq data, particularly when using generalized linear models or likelihood-based approaches for preprocessing and downstream analysis.

Download Full-text

Laser Additive Manufacturing of Titanium-Based Implants

Advances in Civil and Industrial Engineering - Advanced Manufacturing Techniques Using Laser Material Processing ◽

10.4018/978-1-5225-0329-3.ch009 ◽

2016 ◽

pp. 236-247

Author(s):

Martin Ruthandi Maina

Keyword(s):

Physical And Mechanical Properties ◽

Additive Manufacture ◽

Medical Implants ◽

Unique Combination ◽

Manufacture Process ◽

Wide Range ◽

Biomedical Industry ◽

Manufacturing Techniques ◽

Biological Performance ◽

Selective Addition

Titanium and its alloys exhibit a unique combination of mechanical, physical properties and corrosion resistance behaviour which makes them desirable for aerospace, industrial, chemical, medical and energy industries. The selective addition of alloying elements to titanium enables a wide range of physical and mechanical properties to be obtained. Ti-based alloys are finding ever-increasing applications in biomaterials due to their excellent mechanical, physical and biological performance. Intense researches are being pursued in the development of new Ti-based alloys with bio-functionalization closer to human bone, owing to their excellent mechanical strength and resilience when compared to alternative biomaterials, such as polymers and ceramics. Several manufacturing techniques are capable of producing porous materials. There is a need to control pore size, shape, orientation and distribution. This work reviews the application of Ti-based alloys in the biomedical industry and also proposes laser additive manufacture process for the manufacture of medical implants.

Download Full-text

A New Assessment of Offshore Wind Profile Relationships

ASME 2018 1st International Offshore Wind Technical Conference ◽

10.1115/iowtc2018-1052 ◽

2018 ◽

Author(s):

Gus Jeans ◽

Dave Quantrell ◽

Andrew Watson ◽

Laure Grignon ◽

Gil Lizcano

Keyword(s):

Wind Energy ◽

Engineering Design ◽

Wind Profile ◽

Numerical Models ◽

Data Sources ◽

Offshore Wind ◽

Wind Gust ◽

Model Data ◽

Near Surface ◽

Wide Range

Engineering design codes specify a variety of different relationships to quantify vertical variations in wind speed, gust factor and turbulence intensity. These are required to support applications including assessment of wind resource, operability and engineering design. Differences between the available relationships lead to undesirable uncertainty in all stages of an offshore wind project. Reducing these uncertainties will become increasingly important as wind energy is harnessed in deeper waters and at lower costs. Installation of a traditional met mast is not an option in deep water. Reliable measurement of the local wind, gust and turbulence profiles from floating LiDAR can be challenging. Fortunately, alternative data sources can provide improved characterisation of winds at offshore locations. Numerical modelling of wind in the lower few hundred metres of the atmosphere is generally much simpler at remote deepwater locations than over complex onshore terrain. The sophistication, resolution and reliability of such models is advancing rapidly. Mesoscale models can now allow nesting of large scale conditions to horizontal scales less than one kilometre. Models can also provide many decades of wind data, a major advantage over the site specific measurements gathered to support a wind energy development. Model data are also immediately available at the start of a project at relatively low cost. At offshore locations these models can be validated and calibrated, just above the sea surface, using well established satellite wind products. Reliable long term statistics of near surface wind can be used to quantify winds at the higher elevations applicable to wind turbines using the wide range of existing standard profile relationships. Reduced uncertainty in these profile relationships will be of considerable benefit to the wider use of satellite and model data sources in the wind energy industry. This paper describes a new assessment of various industry standard wind profile relationships, using a range of available met mast datasets and numerical models.

Download Full-text

Gene Expression Profile in Similar Tissues Using Transcriptome Sequencing Data of Whole-Body Horse Skeletal Muscle

Genes ◽

10.3390/genes11111359 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1359

Author(s):

Ho-Yeon Lee ◽

Jae-Yoon Kim ◽

Kyoung Hyoun Kim ◽

Seongmun Jeong ◽

Youngbum Cho ◽

...

Keyword(s):

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Whole Body ◽

Rna Seq ◽

Sequencing Data ◽

Exercise Adaptation ◽

Wide Range ◽

Metabolic Properties ◽

Transcriptome Expression ◽

Functional Pathway

Horses have been studied for exercise function rather than food production, unlike most livestock. Therefore, the role and characteristics of tissue landscapes are critically understudied, except for certain muscles used in exercise-related studies. In the present study, we compared RNA-Seq data from 18 Jeju horse skeletal muscles to identify differentially expressed genes (DEGs) between tissues that have similar functions and to characterize these differences. We identified DEGs between different muscles using pairwise differential expression (DE) analyses of tissue transcriptome expression data and classified the samples using the expression values of those genes. Each tissue was largely classified into two groups and their subgroups by k-means clustering, and the DEGs identified in comparison between each group were analyzed by functional/pathway level using gene set enrichment analysis and gene level, confirming the expression of significant genes. As a result of the analysis, the differences in metabolic properties like glycolysis, oxidative phosphorylation, and exercise adaptation of the groups were detected. The results demonstrated that the biochemical and anatomical features of a wide range of muscle tissues in horses could be determined through transcriptome expression analysis, and provided proof-of-concept data demonstrating that RNA-Seq analysis can be used to classify and study in-depth differences between tissues with similar properties.

Download Full-text