The Sheep and the Goats: Distinguishing transcriptional enhancers in a complex chromatin landscape

Mapping Intimacies ◽

10.1101/324582 ◽

2018 ◽

Cited By ~ 1

Author(s):

Anne Sonnenschein ◽

Ian Dworkin ◽

David N. Arnosti

Keyword(s):

Machine Learning ◽

Transcription Factor ◽

Predictive Accuracy ◽

Expression Patterns ◽

Supervised Machine Learning ◽

Regulatory Function ◽

Specific Expression ◽

Enhancer Activity ◽

Genomic Features ◽

Enhancer Identification

ABSTRACTPredicting regulatory function of non-coding DNA using genomic information remains a major goal in genomics, and an important step in interpreting the cis-regulatory code. Regulatory capacity can be partially inferred from transcription factor occupancy, histone modifications, motif enrichment, and evolutionary conservation. However, combinations of these features in well-studied systems such as Drosophila have limited predictive accuracy. Here we examine the current limits of computational enhancer prediction by applying machine-learning methods to an extensive set of genomic features, validating predictions with the Fly Enhancer Resource, which characterized the transcriptional activity of approximately fifteen percent of the genome. Supervised machine learning trained on a range of genomic features identify active elements with a high degree of accuracy, but are less successful at distinguishing tissue-specific expression patterns. Consistent with previous observations of their widespread genomic interactions, many transcription factors were associated with enhancers not known to be direct functional targets. Interestingly, no single factor was necessary for enhancer identification, although binding by the ′pioneer′ transcription factor Zelda was the most predictive feature for enhancer activity. Using an increasing number of predictive features improved classification with diminishing returns. Thus, additional single-timepoint ChIP data may have only marginal utility for discerning true regulatory regions. On the other hand, spatially- and temporally-differentiated genomic features may provide more power for this type of computational enhancer identification. Inclusion of new types of information distinct from current chromatin-immunoprecipitation data may enable more precise identification of enhancers, and further insight into the features that distinguish their biological functions.

Download Full-text

Infinity Flow: High-throughput single-cell quantification of 100s of proteins using conventional flow cytometry and machine learning

10.1101/2020.06.17.152926 ◽

2020 ◽

Author(s):

Etienne Becht ◽

Daniel Tolstrup ◽

Charles-Antoine Dutertre ◽

Florent Ginhoux ◽

Evan W. Newell ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Single Cell ◽

Low Cost ◽

Expression Patterns ◽

Cell Types ◽

Cellular Heterogeneity ◽

Supervised Machine Learning ◽

Melanoma Metastasis ◽

Immunologic Research

AbstractModern immunologic research increasingly requires high-dimensional analyses in order to understand the complex milieu of cell-types that comprise the tissue microenvironments of disease. To achieve this, we developed Infinity Flow combining hundreds of overlapping flow cytometry panels using machine learning to enable the simultaneous analysis of the co-expression patterns of 100s of surface-expressed proteins across millions of individual cells. In this study, we demonstrate that this approach allows the comprehensive analysis of the cellular constituency of the steady-state murine lung and to identify novel cellular heterogeneity in the lungs of melanoma metastasis bearing mice. We show that by using supervised machine learning, Infinity Flow enhances the accuracy and depth of clustering or dimensionality reduction algorithms. Infinity Flow is a highly scalable, low-cost and accessible solution to single cell proteomics in complex tissues.

Download Full-text

Structural analysis and tissue-specific expression patterns of a novel salt-inducible NAC transcription factor gene fromNicotiana tabacumcv. Xanthi

The Journal of Horticultural Science and Biotechnology ◽

10.1080/14620316.2014.11513140 ◽

2014 ◽

Vol 89 (6) ◽

pp. 700-706 ◽

Cited By ~ 6

Author(s):

Q. Q. Han ◽

P. Qiao ◽

Y. Z. Song ◽

J. Y. Zhang

Keyword(s):

Transcription Factor ◽

Structural Analysis ◽

Expression Patterns ◽

Nac Transcription Factor ◽

Transcription Factor Gene ◽

Specific Expression ◽

Tissue Specific ◽

Factor Gene ◽

Tissue Specific Expression

Download Full-text

Refining interaction search through signed iterative Random Forests

10.1101/467498 ◽

2018 ◽

Cited By ~ 4

Author(s):

Karl Kumbier ◽

Sumanta Basu ◽

James B. Brown ◽

Susan Celniker ◽

Bin Yu

Keyword(s):

Gene Expression ◽

Random Forests ◽

Predictive Accuracy ◽

Expression Patterns ◽

High Order ◽

Gene Expression Patterns ◽

Enhancer Activity ◽

Gap Gene ◽

Strength Of Interaction ◽

Interacting Features

AbstractAdvances in supervised learning have enabled accurate prediction in biological systems governed by complex interactions among biomolecules. However, state-of-the-art predictive algorithms are typically “black-boxes,” learning statistical interactions that are difficult to translate into testable hypotheses. The iterative Random Forest (iRF) algorithm took a step towards bridging this gap by providing a computationally tractable procedure to identify the stable, high-order feature interactions that drive the predictive accuracy of Random Forests (RF). Here we refine the interactions identified by iRF to explicitly map responses as a function of interacting features. Our method, signed iRF (s-iRF), describes “subsets” of rules that frequently occur on RF decision paths. We refer to these “rule subsets” as signed interactions. Signed interactions share not only the same set of interacting features but also exhibit similar thresholding behavior, and thus describe a consistent functional relationship between interacting features and responses. We describe stable and predictive importance metrics (SPIMs) to rank signed interactions in terms of their stability, predictive accuracy, and strength of interaction. For each SPIM, we define null importance metrics that characterize its expected behavior under known structure. We evaluate our proposed approach in biologically inspired simulations and two case studies: predicting enhancer activity and spatial gene expression patterns. In the case of enhancer activity, s-iRF recovers one of the few experimentally validated high-order interactions and suggests novel enhancer elements where this interaction may be active. In the case of spatial gene expression patterns, s-iRF recovers all 11 reported links in the gap gene network. By refining the process of interaction recovery, our approach has the potential to guide mechanistic inquiry into systems whose scale and complexity is beyond human comprehension.

Download Full-text

Human library of cardiac promoters and enhancers

10.1101/2020.06.14.150904 ◽

2020 ◽

Author(s):

Ruslan M. Deviatiiarov ◽

Anna Gams ◽

Roman Syunyaev ◽

Tatiana V. Tatarinova ◽

Oleg Gusev ◽

...

Keyword(s):

Heart Diseases ◽

Association Studies ◽

Expression Patterns ◽

Critical Role ◽

Regulatory Elements ◽

Specific Cell ◽

Genome Wide Association Studies ◽

Specific Expression ◽

Enhancer Activity ◽

Human Donor

AbstractGenome regulatory elements play a critical role during cardiac development and maintenance of normal physiological homeostasis, and genome-wide association studies identified a large number of SNPs associated with cardiovascular diseases localized in intergenic zones. We used cap analysis of gene expression (CAGE) to identify transcription start sites (TSS) with one nucleotide resolution that effectively maps genome regulatory elements in a representative collection of human heart tissues. Here we present a comprehensive and fully annotated CAGE atlas of human promoters and enhancers from four chambers of the non-diseased human donor hearts, including both atria and ventricles. We have identified 10,528 novel regulatory elements, where 2,750 are classified as TSS and 4,258 novel enhancers, which were validated with ChIP-seq libraries and motif enrichment analysis. We found that heart-region specific expression patterns are primarily based on the alternative promoter and specific enhancer activity. Our study significantly increased evidence of the association of regulatory elements-located variants with heart morphology and pathologies. The precise location of cardiac disease-related SNPs within the regulatory regions and their correlation with a specific cell type offers a new understanding of genetic heart diseases.

Download Full-text

Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations

F1000Research ◽

10.12688/f1000research.17363.1 ◽

2018 ◽

Vol 7 ◽

pp. 1933 ◽

Cited By ~ 1

Author(s):

Ruipeng Lu ◽

Peter K. Rogan

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Transcription Factor ◽

Binding Site ◽

In Silico ◽

Target Gene ◽

Target Genes ◽

Gene Prediction ◽

Expression Patterns ◽

Tree Classifier

Background:The distribution and composition ofcis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets.Methods:Genes with correlated expression patterns across 53 tissues and TF targets were respectively identified from Bray-Curtis Similarity and TF knockdown experiments. Corresponding promoter sequences were reduced to DNase I-accessible intervals; TFBSs were then identified within these intervals using information theory-based position weight matrices for each TF (iPWMs) and clustered. Features from information-dense TFBS clusters predicted these genes with machine learning classifiers, which were evaluated for accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed toin silicoexamine their impact on cluster densities and the regulatory states of target genes.Results: We initially chose the glucocorticoid receptor gene (NR3C1), whose regulation has been extensively studied, to test this approach.SLC25A32andTANKwere found to exhibit the most similar expression patterns toNR3C1. A Decision Tree classifier exhibited the largest area under the Receiver Operating Characteristic (ROC) curve in detecting such genes. Target gene prediction was confirmed using siRNA knockdown of TFs, which was found to be more accurate than those predicted after CRISPR/CAS9 inactivation.In-silicomutation analyses of TFBSs also revealed that one or more information-dense TFBS clusters in promoters are required for accurate target gene prediction. Conclusions: Machine learning based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.

Download Full-text

First report of rapid, non-invasive, and reagent-free detection of malaria through the skin of patients with a beam of infrared light

10.21203/rs.3.rs-1179114/v1 ◽

2022 ◽

Author(s):

Gabriela Garcia ◽

Tharanga Kariyawasam ◽

Anton Lord ◽

Cristiano Costa ◽

Lana Chaves ◽

...

Keyword(s):

Machine Learning ◽

Near Infrared ◽

Predictive Accuracy ◽

Human Subjects ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Infrared Light ◽

Non Invasive ◽

Full Capacity

Abstract We describe the first application of the Near-infrared spectroscopy (NIRS) technique to detect Plasmodium falciparum and P. vivax malaria parasites through the skin of malaria positive and negative human subjects. NIRS is a rapid, non-invasive and reagent free technique which involves rapid interaction of a beam of light with a biological sample to produce diagnostic signatures in seconds. We used a handheld, miniaturized spectrometer to shine NIRS light on the ear, arm and finger of P. falciparum (n=7) and P. vivax (n=20) positive people and malaria negative individuals (n=33) in a malaria endemic setting in Brazil. Supervised machine learning algorithms for predicting the presence of malaria were applied to predict malaria infection status in independent individuals (n=12). Separate machine learning algorithms for differentiating P. falciparum from P. vivax infected subjects were developed using spectra from the arm and ear of P. falciparum and P. vivax (n=108) and the resultant model predicted infection in spectra of their fingers (n=54).NIRS non-invasively detected malaria positive and negative individuals that were excluded from the model with 100% sensitivity, 83% specificity and 92% accuracy (n=12) with spectra collected from the arm. Moreover, NIRS also correctly differentiated P. vivax from P. falciparum positive individuals with a predictive accuracy of 93% (n=54). These findings are promising but further work on a larger scale is needed to address several gaps in knowledge and establish the full capacity of NIRS as a non-invasive diagnostic tool for malaria. It is recommended that the tool is further evaluated in multiple epidemiological and demographic settings where other factors such as age, mixed infection and skin colour can be incorporated into predictive algorithms to produce more robust models for universal diagnosis of malaria.

Download Full-text

Human atlas of cardiac promoters and enhancers reveals important role of regulatory elements in heritable diseases

10.21203/rs.3.rs-37530/v1 ◽

2020 ◽

Author(s):

Ruslan Deviatiiarov ◽

Anna Gams ◽

Roman Syunyaev ◽

Tatiana Tatarinova ◽

Oleg Gusev ◽

...

Keyword(s):

Heart Diseases ◽

Association Studies ◽

Expression Patterns ◽

Critical Role ◽

Regulatory Elements ◽

Specific Cell ◽

Genome Wide Association Studies ◽

Specific Expression ◽

Enhancer Activity ◽

Human Donor

Abstract Genome regulatory elements play a critical role during cardiac development and maintenance of normal physiological homeostasis, and genome-wide association studies identified a large number of SNPs associated with cardiovascular diseases localized in intergenic zones. We used cap analysis of gene expression (CAGE) to identify transcription start sites (TSS) with one nucleotide resolution that effectively maps genome regulatory elements in a representative collection of human heart tissues. Here we present a comprehensive and fully annotated CAGE atlas of human promoters and enhancers from four chambers of the non-diseased human donor hearts, including both atria and ventricles. We have identified 10,528 novel regulatory elements, where 2,750 are classified as TSS and 4,258 novel enhancers, which were validated with ChIP-seq libraries and motif enrichment analysis. We found that heart-region specific expression patterns are primarily based on the alternative promoter and specific enhancer activity. Our study significantly increased evidence of the association of regulatory elements-located variants with heart morphology and pathologies. The precise location of cardiac disease-related SNPs within the regulatory regions and their correlation with a specific cell type offers a new understanding of genetic heart diseases.

Download Full-text

Prediction Algorithm for ICU Mortality and Length of Stay Using Machine Learning

10.21203/rs.3.rs-992995/v1 ◽

2021 ◽

Author(s):

Shinya IWASE ◽

Taka-aki Nakada ◽

Tadanaga Shimada ◽

Takehiko Oami ◽

Takashi Shimazui ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Length Of Stay ◽

Predictive Value ◽

Predictive Accuracy ◽

Supervised Machine Learning ◽

Icu Patients ◽

Icu Stay ◽

Precise Prediction ◽

Length Of Icu Stay

Abstract Background: Machine learning can predict outcomes and determine variables contributing to precise prediction, and can thus classify patients with different risk factors of outcomes. This study aimed to investigate the predictive accuracy for mortality and length of stay in intensive care unit (ICU) patients using machine learning, and to identify the variables contributing to the precise prediction or classification of patients.Methods: Patients (n=12,747) admitted to the ICU at Chiba University Hospital were randomly assigned to the training and test cohorts. After learning using the variables on admission in the training cohort, the area under the curve (AUC) was analyzed in the test cohort to evaluate the predictive accuracy of the supervised machine learning classifiers, including random forest (RF) for outcomes (primary outcome, mortality; secondary outcome, and length of ICU stay). The rank of the variables that contributed to the machine learning prediction was confirmed, and cluster analysis of the patients with risk factors of mortality was performed to identify the important variables associated with patient outcomes.Results: Machine learning using RF revealed a high predictive value for mortality, with an AUC of 0.945. In addition, RF showed high predictive value for short and long ICU stays, with AUCs of 0.881 and 0.889, respectively. Lactate dehydrogenase (LDH) was identified as a variable contributing to the precise prediction in machine learning for both mortality and length of ICU stay. LDH was also identified as a contributing variable to classify patients into sub-populations based on different risk factors of mortality.Conclusion: The machine learning algorithm could predict mortality and length of stay in ICU patients with high accuracy. LDH was identified as a contributing variable in mortality and length of ICU stay prediction and could be used to classify patients based on mortality risk.

Download Full-text

Genome-Wide Identification and Characterization of Melon bHLH Transcription Factors in Regulation of Fruit Development

Plants ◽

10.3390/plants10122721 ◽

2021 ◽

Vol 10 (12) ◽

pp. 2721

Author(s):

Chao Tan ◽

Huilei Qiao ◽

Ming Ma ◽

Xue Wang ◽

Yunyun Tian ◽

...

Keyword(s):

Transcription Factor ◽

Fruit Ripening ◽

Expression Patterns ◽

Female Flower ◽

Distribution Patterns ◽

Early Developmental Stage ◽

Bhlh Transcription Factor ◽

Transcription Factor Family ◽

Specific Expression ◽

Bhlh Genes

The basic helix-loop-helix (bHLH) transcription factor family is one of the largest transcription factor families in plants and plays crucial roles in plant development. Melon is an important horticultural plant as well as an attractive model plant for studying fruit ripening. However, the bHLH gene family of melon has not yet been identified, and its functions in fruit growth and ripening are seldom researched. In this study, 118 bHLH genes were identified in the melon genome. These CmbHLH genes were unevenly distributed on chromosomes 1 to 12, and five CmbHLHs were tandem repeat on chromosomes 4 and 8. There were 13 intron distribution patterns among the CmbHLH genes. Phylogenetic analysis illustrated that these CmbHLHs could be classified into 16 subfamilies. Expression patterns of the CmbHLH genes were studied using transcriptome data. Tissue specific expression of the CmbHLH32 gene was analysed by quantitative RT-PCR. The results showed that the CmbHLH32 gene was highly expressed in female flower and early developmental stage fruit. Transgenic melon lines overexpressing CmbHLH32 were generated, and overexpression of CmbHLH32 resulted in early fruit ripening compared to wild type. The CmbHLH transcription factor family was identified and analysed for the first time in melon, and overexpression of CmbHLH32 affected the ripening time of melon fruit. These findings laid a foundation for further study on the role of bHLH family members in the growth and development of melon.

Download Full-text

Machine learning on drug-specific data to predict small molecule teratogenicity

10.1101/860627 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anup P. Challa ◽

Andrew L. Beam ◽

Min Shen ◽

Tyler Peryea ◽

Robert R. Lavieri ◽

...

Keyword(s):

Machine Learning ◽

Small Molecule ◽

Predictive Accuracy ◽

Adverse Outcomes ◽

Supervised Machine Learning ◽

Protective Effects ◽

Prescribing Behavior ◽

Increase Risk

AbstractPregnant women are an especially vulnerable population, given the sensitivity of a developing fetus to chemical exposures. However, prescribing behavior for the gravid patient is guided on limited human data and conflicting cases of adverse outcomes due to the exclusion of pregnant populations from randomized, controlled trials. These factors increase risk for adverse drug outcomes and reduce quality of care for pregnant populations. Herein, we propose the application of artificial intelligence to systematically predict the teratogenicity of a prescriptible small molecule from information inherent to the drug. Using unsupervised and supervised machine learning, our model probes all small molecules with known structure and teratogenicity data published in research-amenable formats to identify patterns among structural, meta-structural, and in vitro bioactivity data for each drug and its teratogenicity score. With this workflow, we discovered three chemical functionalities that predispose a drug towards increased teratogenicity and two moieties with potentially protective effects. Our models predict three clinically-relevant classes of teratogenicity with AUC = 0.8 and nearly double the predictive accuracy of a blind control for the same task, suggesting successful modeling. We also present extensive barriers to translational research that restrict data-driven studies in pregnancy and therapeutically “orphan” pregnant populations. Collectively, this work represents a first-in-kind platform for the application of computing to study and predict teratogenicity.

Download Full-text