Evolutionary characteristics of intergenic transcribed regions indicate widespread noisy transcription in the Poaceae

Mapping Intimacies ◽

10.1101/440933 ◽

2018 ◽

Author(s):

John P. Lloyd ◽

Megan J. Bowman ◽

Christina B. Azodi ◽

Rosalie P. Sowers ◽

Gaurav D. Moghe ◽

...

Keyword(s):

Transcriptional Activity ◽

Prediction Models ◽

High Accuracy ◽

Model Systems ◽

Novel Genes ◽

Intergenic Regions ◽

Species Specific ◽

Biochemical Features ◽

Computational Predictions ◽

Functional Phenotype

AbstractExtensive transcriptional activity occurring in unannotated, intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Most ITR sequences are species-specific. Those found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established in Oryza sativa (rice) that could distinguish between benchmark functional (phenotype genes) and nonfunctional (pseudogenes) sequences with high accuracy based on 44 evolutionary and biochemical features. Based on the prediction models, 584 rice ITRs (8%) are classified as likely functional that tend to have conserved expression and ancient retained duplicates. However, most ITRs do not exhibit sequence or expression conservation across species or following duplication, consistent with computational predictions that suggest 61% ITRs are not under selection. We outline key evolutionary characteristics that are tightly associated with likely-functional ITRs and provide a framework to identify novel genes to improve genome annotation and move toward connecting genotype to phenotype in crop and model systems.

Download Full-text

Financial Information Asymmetry: Using Deep Learning Algorithms to Predict Financial Distress

Symmetry ◽

10.3390/sym13030443 ◽

2021 ◽

Vol 13 (3) ◽

pp. 443

Author(s):

Chyan-long Jan

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Information Asymmetry ◽

Error Rate ◽

Financial Distress ◽

Prediction Models ◽

High Accuracy ◽

Financial Information ◽

Financial Distress Prediction ◽

Distress Prediction

Because of the financial information asymmetry, the stakeholders usually do not know a company’s real financial condition until financial distress occurs. Financial distress not only influences a company’s operational sustainability and damages the rights and interests of its stakeholders, it may also harm the national economy and society; hence, it is very important to build high-accuracy financial distress prediction models. The purpose of this study is to build high-accuracy and effective financial distress prediction models by two representative deep learning algorithms: Deep neural networks (DNN) and convolutional neural networks (CNN). In addition, important variables are selected by the chi-squared automatic interaction detector (CHAID). In this study, the data of Taiwan’s listed and OTC sample companies are taken from the Taiwan Economic Journal (TEJ) database during the period from 2000 to 2019, including 86 companies in financial distress and 258 not in financial distress, for a total of 344 companies. According to the empirical results, with the important variables selected by CHAID and modeling by CNN, the CHAID-CNN model has the highest financial distress prediction accuracy rate of 94.23%, and the lowest type I error rate and type II error rate, which are 0.96% and 4.81%, respectively.

Download Full-text

SCRUM-Japan genesis virtual sequencing (VSQ) project: A novel algorithm combining deep learning (DL) with pathological diagnostics to enable the prediction of BRAF mutations and microsatellite instability (MSI) in advanced colorectal cancer (CRC).

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.3_suppl.112 ◽

2021 ◽

Vol 39 (3_suppl) ◽

pp. 112-112

Author(s):

Satoshi Fujii ◽

Daisuke Kotani ◽

Masahiro Hattori ◽

Nishihara Masato ◽

Toshihide Shikanai ◽

...

Keyword(s):

Prediction Models ◽

High Accuracy ◽

Cancer Genome ◽

Morphological Features ◽

Braf V600e ◽

Next Generation ◽

Braf Mutations ◽

Genetic Abnormalities ◽

Image Prediction ◽

Novel Algorithm

112 Background: Numerous genetic and epigenetic abnormalities may lead to various morphologies of cancer. However, exactly which gene abnormality causes which morphology is unknown. The VSQ Project aims at investigating a novel algorithm by synergistically fusing DL technology and pathological diagnostics for the prediction of cancer genome abnormalities. This was achieved by elucidating the association between the morphological findings and genetic abnormalities, including BRAF V600E mutations and MSI status directly linked to the therapeutic strategies for advanced CRC patients (pts). Methods: Clinicopathological-genomic integrated DB derived from SCRUM-Japan GI-SCREEN, a nation-wide cancer genome screening project including CRC, were used. A total of 1,657 images of thin sections (one representative image per pt) cut from formalin-fixed and paraffin-embedded (FFPE) tissue specimens from primary or metastatic tumors with genetic abnormalities confirmed by next-generation sequencing (NGS) were investigated; 1,234 and 423 images (one per pt) were used for training and validation cohorts, respectively. First, we developed image-prediction models based on the morphological features precisely annotated by the single central pathologist, and then constructed the DL algorithms (gene-prediction models) that enabled the prediction of gene abnormalities by using images filtered by the image-prediction models. Results: We achieved high accuracy of AUC > 0.90 for 12 features among the 33 morphological features analyzed. Next, we created several DL algorithms that enabled the prediction of BRAF mutations and MSI. The prediction level reached a high accuracy of AUC = 0.955 for the BRAF mutations and AUC = 0.857 for MSI in the training cohort. We verified the AUCs in the validation cohort and achieved AUC = 0.831 and 0.883 for BRAF mutations and MSI, respectively. Conclusions: Our findings suggest that VSQ can appropriately predict BRAF mutation and MSI status in advanced CRC, potentially without performing NGS tests. VSQ may also enable prompt initiation of systemic treatments in CRC patients as well as establish an unprecedented next-generation pathology in the near future.

Download Full-text

Pathogenesis of Gram-Negative Bacteremia

Clinical Microbiology Reviews ◽

10.1128/cmr.00234-20 ◽

2021 ◽

Vol 34 (2) ◽

Author(s):

Caitlyn L. Holmes ◽

Mark T. Anderson ◽

Harry L. T. Mobley ◽

Michael A. Bachman

Keyword(s):

Immune Responses ◽

Global Economy ◽

Human Infection ◽

Model Systems ◽

Small Subset ◽

Gram Negative ◽

Content Type ◽

Capsule Production ◽

Gram Negative Bacteremia ◽

Species Specific

SUMMARY Gram-negative bacteremia is a devastating public health threat, with high mortality in vulnerable populations and significant costs to the global economy. Concerningly, rates of both Gram-negative bacteremia and antimicrobial resistance in the causative species are increasing. Gram-negative bacteremia develops in three phases. First, bacteria invade or colonize initial sites of infection. Second, bacteria overcome host barriers, such as immune responses, and disseminate from initial body sites to the bloodstream. Third, bacteria adapt to survive in the blood and blood-filtering organs. To develop new therapies, it is critical to define species-specific and multispecies fitness factors required for bacteremia in model systems that are relevant to human infection. A small subset of species is responsible for the majority of Gram-negative bacteremia cases, including Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, and Acinetobacter baumannii. The few bacteremia fitness factors identified in these prominent Gram-negative species demonstrate shared and unique pathogenic mechanisms at each phase of bacteremia progression. Capsule production, adhesins, and metabolic flexibility are common mediators, whereas only some species utilize toxins. This review provides an overview of Gram-negative bacteremia, compares animal models for bacteremia, and discusses prevalent Gram-negative bacteremia species.

Download Full-text

Learners Demographics Classification on MOOCs During the COVID-19: Author Profiling via Deep Learning Based on Semantic and Syntactic Representations

Frontiers in Research Metrics and Analytics ◽

10.3389/frma.2021.673928 ◽

2021 ◽

Vol 6 ◽

Author(s):

Tahani Aljohani ◽

Alexandra I. Cristea

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Prediction Models ◽

Short Term Memory ◽

Methodological Approach ◽

High Accuracy ◽

Directional Model ◽

Textual Representations ◽

The One

Massive Open Online Courses (MOOCs) have become universal learning resources, and the COVID-19 pandemic is rendering these platforms even more necessary. In this paper, we seek to improve Learner Profiling (LP), i.e. estimating the demographic characteristics of learners in MOOC platforms. We have focused on examining models which show promise elsewhere, but were never examined in the LP area (deep learning models) based on effective textual representations. As LP characteristics, we predict here the employment status of learners. We compare sequential and parallel ensemble deep learning architectures based on Convolutional Neural Networks and Recurrent Neural Networks, obtaining an average high accuracy of 96.3% for our best method. Next, we predict the gender of learners based on syntactic knowledge from the text. We compare different tree-structured Long-Short-Term Memory models (as state-of-the-art candidates) and provide our novel version of a Bi-directional composition function for existing architectures. In addition, we evaluate 18 different combinations of word-level encoding and sentence-level encoding functions. Based on these results, we show that our Bi-directional model outperforms all other models and the highest accuracy result among our models is the one based on the combination of FeedForward Neural Network and the Stack-augmented Parser-Interpreter Neural Network (82.60% prediction accuracy). We argue that our prediction models recommended for both demographics characteristics examined in this study can achieve high accuracy. This is additionally also the first time a sound methodological approach toward improving accuracy for learner demographics classification on MOOCs was proposed.

Download Full-text

NetGenes: A database of essential genes predicted using features from interaction networks

10.1101/2020.12.17.423287 ◽

2020 ◽

Author(s):

Vimaladhasan Senthamizhan ◽

Balaraman Ravindran ◽

Karthik Raman

Keyword(s):

Prediction Models ◽

Essential Gene ◽

Gene Prediction ◽

High Accuracy ◽

Essential Genes ◽

Interaction Networks ◽

Functional Association ◽

String Database ◽

Feature Vectors ◽

Link Type

AbstractEssential gene prediction models built so far are heavily reliant on sequence-based features and the scope of network-based features has been narrow. Previous work from our group demonstrated the importance of using network-based features for predicting essential genes with high accuracy. Here, we applied our approach for the prediction of essential genes to organisms from the STRING database and hosted the results in a standalone website. Our database, NetGenes, contains essential gene predictions for 2700+ bacteria predicted using features derived from STRING protein-protein functional association networks. Housing a total of 3.5M+ genes, NetGenes offers various features like essentiality scores, annotations and feature vectors for each gene. NetGenes is available at https://rbc-dsai.iitm.github.io/NetGenes/

Download Full-text

Microbiomes of pathogenic Vibrio species reveal environmental and planktonic associations

10.21203/rs.2.18876/v1 ◽

2019 ◽

Author(s):

Rachel E. Diner ◽

Ariel J. Rabines ◽

Hong Zheng ◽

Joshua A. Steele ◽

John F. Griffith ◽

...

Keyword(s):

Distinct Species ◽

Human Infection ◽

Model Systems ◽

Taxonomic Resolution ◽

Environmental Preferences ◽

Pathogenic Vibrio ◽

Photosynthetic Organism ◽

One Year ◽

Species Specific ◽

Planktonic Community

Abstract Background Many species of coastal Vibrio spp. bacteria can infect humans, representing an emerging health threat linked to increasing seawater temperatures. Vibrio interactions with the planktonic community impact coastal ecology and human infection potential. In particular, interactions with eukaryotic and photosynthetic organism may provide attachment substrate and critical nutrients (e.g. chitin, phytoplankton exudates) that facilitate the persistence, diversification, and spread of pathogenic Vibrio spp. Vibrio interactions with these organisms in an environmental context are, however, poorly understood.Results We quantified pathogenic Vibrio species, including V. cholerae, V. parahaemolyticus, and V. vulnificus, and two virulence-associated genes for one year at five coastal sites in Southern California and used metabarcoding to profile associated prokaryotic and eukaryotic communities, including vibrio-specific communities. These Vibrio spp. reached high abundances, particularly during Summer months, and inhabited distinct species-specific environmental niches driven by temperature and salinity. Associated bacterial and eukaryotic taxa identified at fine-scale taxonomic resolution revealed genus and species-level relationships. For example, common Thalassiosira genera diatoms capable of exuding chitin were positively associated with V. cholerae and V. vulnificus in a species-specific manner, while the most abundant eukaryotic genus, the diatom Chaetoceros, was positively associated with V. parahaemolyticus. Associations were often linked to shared environmental preferences, and several copepod genera were linked to low-salinity environmental conditions and abundant V. cholerae and V. vulnificus.Conclusions This study clarifies ecological relationships between pathogenic Vibrio spp. and the planktonic community, elucidating new functionally relevant associations, establishing a workflow for examining environmental pathogen microbiomes, and highlighting prospective model systems for future mechanistic studies.

Download Full-text

Methods of MicroRNA Promoter Prediction and Transcription Factor Mediated Regulatory Network

BioMed Research International ◽

10.1155/2017/7049406 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 19

Author(s):

Yuming Zhao ◽

Fang Wang ◽

Su Chen ◽

Jun Wan ◽

Guohua Wang

Keyword(s):

Transcription Factor ◽

Regulatory Network ◽

Regulatory Networks ◽

High Throughput Sequencing ◽

Prediction Models ◽

Promoter Prediction ◽

Sequencing Data ◽

Protein Coding ◽

Mirna Genes ◽

Intergenic Regions

MicroRNAs (miRNAs) are short (~22 nucleotides) noncoding RNAs and disseminated throughout the genome, either in the intergenic regions or in the intronic sequences of protein-coding genes. MiRNAs have been proved to play important roles in regulating gene expression. Hence, understanding the transcriptional mechanism of miRNA genes is a very critical step to uncover the whole regulatory network. A number of miRNA promoter prediction models have been proposed in the past decade. This review summarized several most popular miRNA promoter prediction models which used genome sequence features, or other features, for example, histone markers, RNA Pol II binding sites, and nucleosome-free regions, achieved by high-throughput sequencing data. Some databases were described as resources for miRNA promoter information. We then performed comprehensive discussion on prediction and identification of transcription factor mediated microRNA regulatory networks.

Download Full-text

Transcriptome-wide Identification and Expression Analysis of Brachypodium distachyon Transposons in Response to Viral Infection

Turkish Journal of Agriculture - Food Science and Technology ◽

10.24925/turjaf.v5i10.1156-1160.1260 ◽

2017 ◽

Vol 5 (10) ◽

pp. 1156

Author(s):

Tuğba Gürkök

Keyword(s):

Abiotic Stress ◽

Viral Infection ◽

Stress Resistance ◽

Transcriptional Activity ◽

Brachypodium Distachyon ◽

Breeding Strategies ◽

Rna Seq ◽

Transcription Activity ◽

Transcriptomic Data ◽

Intergenic Regions

Transposable elements (TEs) are the most abundant group of genomic elements in plants that can be found in genic or intergenic regions of their host genomes. Several stimuli such as biotic or abiotic stress have roles in either activating their transcription or transposition. Here the effect of the Panicum mosaic virus (PMV) and its satellite virus (SPMV) infection on the transposon transcription of the Brachypodium distachyon model plant was investigated. To evaluate the transcription activity of TEs, transcriptomic data of mock and virus inoculated plants were compared. Our results indicate that major components of TEs are retroelements in all RNA-seq libraries. The number of transcribed TEs detected in mock inoculated plants is higher than virus inoculated plants. In comparison with mock inoculated plants 13% of the TEs showed at least two folds alteration upon PMV infection and 21% upon PMV+SPMV infection. Rather than inoculation with PMV alone inoculation with PMV+SPMV together also increased various TE encoding transcripts expressions. MuDR-N78C_OS encoding transcript was strongly up-regulated against both PMV and PMV+SPMV infection. The synergism generated by PMV and SPMV together enhanced TE transcripts expressions than PMV alone. It was observed that viral infection induced the transcriptional activity of several transposons. The results suggest that increased expressions of TEs might have a role in response to biotic stress in B. distachyon. Identification of TEs which are taking part in stress can serve useful information for functional genomics and designing novel breeding strategies in developing stress resistance crops.

Download Full-text

Growing Glia: Cultivating Human Stem Cell Models of Gliogenesis in Health and Disease

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.649538 ◽

2021 ◽

Vol 9 ◽

Author(s):

Samantha N. Lanjewar ◽

Steven A. Sloan

Keyword(s):

Nervous System ◽

Stem Cell ◽

System Development ◽

Stem Cell Differentiation ◽

Nervous System Development ◽

Model Systems ◽

Human Stem Cell ◽

Human Stem Cells ◽

Health And Disease ◽

Species Specific

Glia are present in all organisms with a central nervous system but considerably differ in their diversity, functions, and numbers. Coordinated efforts across many model systems have contributed to our understanding of glial-glial and neuron-glial interactions during nervous system development and disease, but human glia exhibit prominent species-specific attributes. Limited access to primary samples at critical developmental timepoints constrains our ability to assess glial contributions in human tissues. This challenge has been addressed throughout the past decade via advancements in human stem cell differentiation protocols that now offer the ability to model human astrocytes, oligodendrocytes, and microglia. Here, we review the use of novel 2D cell culture protocols, 3D organoid models, and bioengineered systems derived from human stem cells to study human glial development and the role of glia in neurodevelopmental disorders.

Download Full-text

Defining the functional significance of intergenic transcribed regions

10.1101/127282 ◽

2017 ◽

Cited By ~ 1

Author(s):

John P. Lloyd ◽

Zing Tsung-Yeh Tsai ◽

Rosalie P. Sowers ◽

Nicholas L. Panchy ◽

Shin-Han Shiu

Keyword(s):

Flowering Plant ◽

Sequence Structure ◽

Transcriptional Noise ◽

Novel Genes ◽

Protein Coding ◽

Genome Wide ◽

Intergenic Regions ◽

Flowering Plant Species ◽

Genomic Regions ◽

Rna Genes

ABSTRACTWith advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning classifiers using Arabidopsis thaliana as a model that accurately distinguish functional sequences (phenotype genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.

Download Full-text