Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture

Mapping Intimacies ◽

10.1101/103614 ◽

2017 ◽

Cited By ~ 23

Author(s):

Jacob Schreiber ◽

Maxwell Libbrecht ◽

Jeffrey Bilmes ◽

William Stafford Noble

Keyword(s):

Nucleotide Sequence ◽

Characteristic Curve ◽

Replication Timing ◽

Cell Types ◽

Computational Method ◽

Contact Maps ◽

Chromatin Architecture ◽

Statistical Confidence ◽

Gm12878 Cell ◽

Time Required

AbstractRecently, Hi-C has been used to probe the 3D chromatin architecture of multiple organisms and cell types. The resulting collections of pairwise contacts across the genome have connected chromatin architecture to many cellular phenomena, including replication timing and gene regulation. However, high resolution (10 kb or finer) contact maps remain scarce due to the expense and time required for collection. A computational method for predicting pairwise contacts without the need to run a Hi-C experiment would be invaluable in understanding the role that 3D chromatin architecture plays in genome biology. We describe Rambutan, a deep convolutional neural network that predicts Hi-C contacts at 1 kb resolution using nucleotide sequence and DNaseI assay signal as inputs. Specifically, Rambutan identifies locus pairs that engage in high confidence contacts according to Fit-Hi-C, a previously described method for assigning statistical confidence estimates to Hi-C contacts. We first demonstrate Rambutan’s performance across chromosomes at 1 kb resolution in the GM12878 cell line. Subsequently, we measure Rambutan’s performance across six cell types. In this setting, the model achieves an area under the receiver operating characteristic curve between 0.7662 and 0.8246 and an area under the precision-recall curve between 0.3737 and 0.9008. We further demonstrate that the predicted contacts exhibit expected trends relative to histone modification ChlP-seq data, replication timing measurements, and annotations of functional elements such as promoters and enhancers. Finally, we predict Hi-C contacts for 53 human cell types and show that the predictions cluster by cellular function. [NOTE: After our original submission we discovered an error in our calling of statistically significant contacts. Briefly, when calculating the prior probability of a contact, we used the number of contacts at a certain genomic distance in a chromosome but divided by the total number of bins in the full genome. When we corrected this mistake we noticed that the Rambutan model, as it curently stands, did not outperform simply using the GM12878 contact map that Rambutan was trained on as the predictor in other cell types. While we investigate these new results, we ask that readers treat this manuscript skeptically.]

Download Full-text

EMeth: An EM algorithm for cell type decomposition based on DNA methylation data

Scientific Reports ◽

10.1038/s41598-021-84864-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hanyu Zhang ◽

Ruoyi Cai ◽

James Dai ◽

Wei Sun

Keyword(s):

Dna Methylation ◽

Tumor Cells ◽

T Regulatory Cells ◽

Simulated Data ◽

Cell Types ◽

Computational Method ◽

Methylation Data ◽

Cell Type ◽

A Cell ◽

Type Decomposition

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.

Download Full-text

Analysis of fragment ends in plasma DNA from patients with cancer

10.1101/2021.04.23.21255935 ◽

2021 ◽

Author(s):

Karan K. Budhraja ◽

Bradon R. McDonald ◽

Michelle D. Stephens ◽

Tania Contente-Cuomo ◽

Havell Markus ◽

...

Keyword(s):

Nucleotide Sequence ◽

Characteristic Curve ◽

Cost Effective ◽

Cancer Diagnostics ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

Plasma Dna ◽

Patients With Cancer ◽

Fragmentation Patterns ◽

Tumor Dna

AbstractFragmentation patterns observed in plasma DNA reflect chromatin accessibility in contributing cells. Since DNA shed from cancer cells and blood cells may differ in fragmentation patterns, we investigated whether analysis of genomic positioning and nucleotide sequence at fragment ends can reveal the presence of tumor DNA in blood and aid cancer diagnostics. We analyzed whole genome sequencing data from >2700 plasma DNA samples including healthy individuals and patients with 11 different cancer types. We observed higher fractions of fragments with aberrantly positioned ends in patients with cancer, driven by contribution of tumor DNA into plasma. Genomewide analysis of fragment ends using machine learning showed overall area under the receiver operative characteristic curve of 0.96 for detection of cancer. Our findings remained robust with as few as 1 million fragments analyzed per sample, suggesting that analysis of fragment ends can become a cost-effective and accessible approach for cancer detection and monitoring.One-sentence summaryAnalyzing the positioning and nucleotide sequence at fragment ends in plasma DNA may enable cancer diagnostics.

Download Full-text

A computational method to aid the design and analysis of single cell RNA-seq experiments for cell type identification

10.1101/247114 ◽

2018 ◽

Cited By ~ 1

Author(s):

Douglas Abrams ◽

Parveen Kumar ◽

R. Krishna Murthy Karuturi ◽

Joshy George

Keyword(s):

Experimental Design ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

Cell Number ◽

Fold Change ◽

Computational Method ◽

Marker Genes ◽

Cell Type ◽

Estimate Sample Size

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.

Download Full-text

STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btaa755 ◽

2020 ◽

Cited By ~ 1

Author(s):

Massimo Andreatta ◽

Santiago J Carmona

Keyword(s):

Single Cell ◽

Distance Measure ◽

Source Code ◽

Cell Types ◽

R Package ◽

Computational Method ◽

Biological Variability ◽

Rna Seq ◽

Batch Effects ◽

Guide Trees

Abstract Summary STACAS is a computational method for the identification of integration anchors in the Seurat environment, optimized for the integration of single-cell (sc) RNA-seq datasets that share only a subset of cell types. We demonstrate that by (i) correcting batch effects while preserving relevant biological variability across datasets, (ii) filtering aberrant integration anchors with a quantitative distance measure and (iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. Availability and implementation Source code and R package available at https://github.com/carmonalab/STACAS; Docker image available at https://hub.docker.com/repository/docker/mandrea1/stacas_demo.

Download Full-text

TzanckNet: a convolutional neural network to identify cells in the cytology of erosive-vesiculobullous diseases

Scientific Reports ◽

10.1038/s41598-020-75546-z ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Mehmet Alican Noyan ◽

Murat Durdu ◽

Ali Haydar Eskiocak

Keyword(s):

Low Cost ◽

Characteristic Curve ◽

Well Being ◽

Cell Types ◽

Giant Cells ◽

Validation Dataset ◽

Smear Test ◽

Normal Keratinocytes ◽

Granulomatous Diseases ◽

Deep Learning Model

Abstract Tzanck smear test is a low-cost, rapid and reliable tool which can be used for the diagnosis of many erosive-vesiculobullous, tumoral and granulomatous diseases. Currently its use is limited mainly due to lack of experience in interpretation of the smears. We developed a deep learning model, TzanckNet, that can identify cells in Tzanck smear test findings. TzanckNet was trained on a retrospective development dataset of 2260 Tzanck smear images collected between December 2006 and December 2019. The finalized model was evaluated using a prospective validation dataset of 359 Tzanck smear images collected from 15 patients during January 2020. It is designed to recognize six cell types (acantholytic cells, eosinophils, hypha, multinucleated giant cells, normal keratinocytes and tadpole cells). For 359 images and 6 cell types, TzanckNet made 2154 predictions. The accuracy was 94.3% (95% CI 93.4–95.3), the sensitivity was 83.7% (95% CI 80.3–87.0) and the specificity was 97.3% (95% CI 96.5–98.1). The area under the receiver operating characteristic curve was 0.974. Our results show that TzanckNet has the potential to lower the experience barrier needed to use this test, broadening its user base, and hence improving patient well-being.

Download Full-text

Efficient, quick and easy-to-use DNA replication timing analysis with START-R suite

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa045 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 1

Author(s):

Djihad Hadjadj ◽

Thomas Denecker ◽

Eva Guérin ◽

Su-Jung Kim ◽

Fabien Fauchereau ◽

...

Keyword(s):

Dna Replication ◽

Cell Fate ◽

Web Application ◽

Timing Analysis ◽

Mutant Cell ◽

Replication Timing ◽

Experimental Conditions ◽

Replication Process ◽

Time Required ◽

Dna Replication Timing

Abstract DNA replication must be faithful and follow a well-defined spatiotemporal program closely linked to transcriptional activity, epigenomic marks, intranuclear structures, mutation rate and cell fate determination. Among the readouts of the spatiotemporal program of DNA replication, replication timing analyses require not only complex and time-consuming experimental procedures, but also skills in bioinformatics. We developed a dedicated Shiny interactive web application, the START-R (Simple Tool for the Analysis of the Replication Timing based on R) suite, which analyzes DNA replication timing in a given organism with high-throughput data. It reduces the time required for generating and analyzing simultaneously data from several samples. It automatically detects different types of timing regions and identifies significant differences between two experimental conditions in ∼15 min. In conclusion, START-R suite allows quick, efficient and easier analyses of DNA replication timing for all organisms. This novel approach can be used by every biologist. It is now simpler to use this method in order to understand, for example, whether ‘a favorite gene or protein’ has an impact on replication process or, indirectly, on genomic organization (as Hi-C experiments), by comparing the replication timing profiles between wild-type and mutant cell lines.

Download Full-text

Acidic precursor revealed in human eosinophil granule major basic protein cDNA.

Journal of Experimental Medicine ◽

10.1084/jem.168.4.1493 ◽

1988 ◽

Vol 168 (4) ◽

pp. 1493-1498 ◽

Cited By ~ 90

Author(s):

R L Barker ◽

G J Gleich ◽

L R Pease

Keyword(s):

Endoplasmic Reticulum ◽

Nucleotide Sequence ◽

Cdna Library ◽

Basic Protein ◽

Cell Types ◽

Major Basic Protein ◽

Human Eosinophil ◽

Damage Cell ◽

Eosinophil Granule ◽

Single Polypeptide

Eosinophil granule major basic protein (MBP), a potent toxin for helminths and various cell types, is a 13.8-kD single polypeptide rich in arginine with a calculated isoelectric point (pI) of 10.9. A cDNA for human MBP was isolated from a gamma GT10 HL-60 cDNA library. The nucleotide sequence of the MBP cDNA indicates that MBP is translated as a 25.2-kD preproprotein. The 9.9-kD pro-portion of proMBP is rich in glutamic and aspartic acids and has a calculated pI of 3.9, while proMBP itself has a calculated pI of 6.2. We suggest that MBP is translated as a nontoxic precursor that protects the eosinophil from damage while the protein is processed through the endoplasmic reticulum to its sequestered site in the granule core toxic MBP, and we present results from the literature suggesting that other cationic toxins, which damage cell membranes, may also be processed from nontoxic precursors containing distinct anionic and cationic regions.

Download Full-text

Regulation of the PU.1 Gene by Sense and Functional Antisense RNAs Generated through the Same Chromatin Architecture.

Blood ◽

10.1182/blood.v108.11.781.781 ◽

2006 ◽

Vol 108 (11) ◽

pp. 781-781

Author(s):

Alex Ebralidze ◽

Pu Zhang ◽

Frank Rosenbauer ◽

Gang Huang ◽

Ulrich Steidl ◽

...

Keyword(s):

Regulatory Element ◽

Cell Types ◽

Antisense Transcription ◽

Fine Tuning ◽

Nuclear Fraction ◽

Antisense Transcripts ◽

Chromosome Conformation ◽

Antisense Rnas ◽

Chromatin Architecture

Abstract The transcription factor PU.1 is an important regulator of hematopoiesis and correct expression levels in specific lineages are critical for normal hematopoietic development. Specifically, PU.1 is maintained or upregulated in specific lineages, and failure to downregulate PU.1 in other lineages can lead to a block in development of that lineage and/or leukemia. In vivo expression of PU.1 is dependent on an upstream regulatory element called the URE. Disruption of the URE leads to downregulation of PU.1 and development of leukemia and lymphoma, but the other distal elements regulating PU.1 have not been defined. Here we show that other phylogenetically conserved elements participate in the initiation of antisense transcription, and that these antisense RNAs function as important modulators of proper dosages of PU.1. Specifically, antisense transcripts originate from specific conserved sites in introns 1 and 3, and that the intron 3 site contains binding sites for transcription factors such as AML1 and Ets factors. The conserved intron 3 element also possesses anti-sense promoter activity. These antisense transcripts are present at about 15% of PU.1 sense transcripts in PU.1 expressing cells. They negatively regulate PU.1 sense RNA, as introduction of siRNA molecules which specifically target antisense transcripts lead to 3–5 fold increases in PU.1 sense RNA and protein. Both sense and antisense PU.1 gene RNAs are dependent on the URE and are transcribed from the same chromatin architecture, in which the conserved elements, including URE and sense and antisense promoters are located in the same nuclear fraction and can be shown to exist in the same nucleoprotein complex by chromosome conformation capture (3C). We are currently testing the mechanisms involved in the formation of such complexes, and specifically whether the complexes are mediated by binding of AML1 to the URE. Since we do not observe significant differences in antisense transcript levels between PU.1 high-expressing and PU.1 low-expressing cells, we hypothesize that the function of these antisense transcripts is to modulate rather than absolutely control PU.1 levels: in PU.1 high-expressing cells, such as myeloid cells, the antisense transcripts trim PU.1 levels to prevent overexpression, while in cell types in which PU.1 is not expressed, such as T cells, the antisense transcripts prevent any expression of PU.1. We propose that such a mechanism will likely be important in fine-tuning the regulation of many genes and may be the reason for the large number of overlapping complementary transcripts with so far unknown function.

Download Full-text

Comparison of actin and cell surface dynamics in motile fibroblasts.

The Journal of Cell Biology ◽

10.1083/jcb.119.2.367 ◽

1992 ◽

Vol 119 (2) ◽

pp. 367-377 ◽

Cited By ~ 98

Author(s):

J A Theriot ◽

T J Mitchison

Keyword(s):

Actin Cytoskeleton ◽

Dynamic Behavior ◽

Actin Filaments ◽

Half Life ◽

Dorsal Surface ◽

Cell Types ◽

3T3 Cells ◽

3T3 Fibroblasts ◽

Continuous Filament ◽

Time Required

We have investigated the dynamic behavior of actin in fibroblast lamellipodia using photoactivation of fluorescence. Activated regions of caged resorufin (CR)-labeled actin in lamellipodia of IMR 90 and MC7 3T3 fibroblasts were observed to move centripetally over time. Thus in these cells, actin filaments move centripetally relative to the substrate. Rates were characteristic for each cell type; 0.66 +/- 0.27 microns/min in IMR 90 and 0.36 +/- 0.16 microns/min in MC7 3T3 cells. In neither case was there any correlation between the rate of actin movement and the rate of lamellipodial protrusion. The half-life of the activated CR-actin filaments was approximately 1 min in IMR 90 lamellipodia, and approximately 3 min in MC7 3T3 lamellipodia. Thus continuous filament turnover accompanies centripetal movement. In both cell types, the length of time required for a section of the actin meshwork to traverse the lamellipodium was several times longer than the filament half-life. The dynamic behavior of the dorsal surface of the cell was also observed by tracking lectin-coated beads on the surface and phase-dense features within lamellipodia of MC7 3T3 cells. The movement of these dorsal features occurred at rates approximately three times faster than the rate of movement of the underlying bulk actin cytoskeleton, even when measured in the same individual cells. Thus the transport of these dorsal features must occur by some mechanism other than simple attachment to the moving bulk actin cytoskeleton.

Download Full-text

A low polymorphic mouse H-2 class I gene from the Tla complex is expressed in a broad variety of cell types.

Journal of Experimental Medicine ◽

10.1084/jem.166.2.341 ◽

1987 ◽

Vol 166 (2) ◽

pp. 341-361 ◽

Cited By ~ 45

Author(s):

C Transy ◽

S R Nash ◽

B David-Watine ◽

M Cochet ◽

S W Hunt ◽

...

Keyword(s):

Nucleotide Sequence ◽

Cdna Clone ◽

Cell Types ◽

Class I ◽

Surface Molecule ◽

Coding Region ◽

Genes Encoding ◽

Broad Variety ◽

I Gene

We have previously described the isolation of pH-2d-37, a cDNA clone that encodes a so far unknown, poorly polymorphic, class I surface molecule. We report here the isolation of the corresponding gene, its nucleotide sequence, and its localization in the Tla region of the murine MHC. Using a RNase mapping assay, we have confirmed that the second domain coding region of the 37 gene displays very limited polymorphism, and that the gene is transcribed in a broad variety of cell types, in contrast to the genes encoding the known Qa and TL antigens. Possible functions are discussed.

Download Full-text