scholarly journals Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture

2017 ◽  
Author(s):  
Jacob Schreiber ◽  
Maxwell Libbrecht ◽  
Jeffrey Bilmes ◽  
William Stafford Noble

AbstractRecently, Hi-C has been used to probe the 3D chromatin architecture of multiple organisms and cell types. The resulting collections of pairwise contacts across the genome have connected chromatin architecture to many cellular phenomena, including replication timing and gene regulation. However, high resolution (10 kb or finer) contact maps remain scarce due to the expense and time required for collection. A computational method for predicting pairwise contacts without the need to run a Hi-C experiment would be invaluable in understanding the role that 3D chromatin architecture plays in genome biology. We describe Rambutan, a deep convolutional neural network that predicts Hi-C contacts at 1 kb resolution using nucleotide sequence and DNaseI assay signal as inputs. Specifically, Rambutan identifies locus pairs that engage in high confidence contacts according to Fit-Hi-C, a previously described method for assigning statistical confidence estimates to Hi-C contacts. We first demonstrate Rambutan’s performance across chromosomes at 1 kb resolution in the GM12878 cell line. Subsequently, we measure Rambutan’s performance across six cell types. In this setting, the model achieves an area under the receiver operating characteristic curve between 0.7662 and 0.8246 and an area under the precision-recall curve between 0.3737 and 0.9008. We further demonstrate that the predicted contacts exhibit expected trends relative to histone modification ChlP-seq data, replication timing measurements, and annotations of functional elements such as promoters and enhancers. Finally, we predict Hi-C contacts for 53 human cell types and show that the predictions cluster by cellular function. [NOTE: After our original submission we discovered an error in our calling of statistically significant contacts. Briefly, when calculating the prior probability of a contact, we used the number of contacts at a certain genomic distance in a chromosome but divided by the total number of bins in the full genome. When we corrected this mistake we noticed that the Rambutan model, as it curently stands, did not outperform simply using the GM12878 contact map that Rambutan was trained on as the predictor in other cell types. While we investigate these new results, we ask that readers treat this manuscript skeptically.]

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hanyu Zhang ◽  
Ruoyi Cai ◽  
James Dai ◽  
Wei Sun

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.


2021 ◽  
Author(s):  
Karan K. Budhraja ◽  
Bradon R. McDonald ◽  
Michelle D. Stephens ◽  
Tania Contente-Cuomo ◽  
Havell Markus ◽  
...  

AbstractFragmentation patterns observed in plasma DNA reflect chromatin accessibility in contributing cells. Since DNA shed from cancer cells and blood cells may differ in fragmentation patterns, we investigated whether analysis of genomic positioning and nucleotide sequence at fragment ends can reveal the presence of tumor DNA in blood and aid cancer diagnostics. We analyzed whole genome sequencing data from >2700 plasma DNA samples including healthy individuals and patients with 11 different cancer types. We observed higher fractions of fragments with aberrantly positioned ends in patients with cancer, driven by contribution of tumor DNA into plasma. Genomewide analysis of fragment ends using machine learning showed overall area under the receiver operative characteristic curve of 0.96 for detection of cancer. Our findings remained robust with as few as 1 million fragments analyzed per sample, suggesting that analysis of fragment ends can become a cost-effective and accessible approach for cancer detection and monitoring.One-sentence summaryAnalyzing the positioning and nucleotide sequence at fragment ends in plasma DNA may enable cancer diagnostics.


2018 ◽  
Author(s):  
Douglas Abrams ◽  
Parveen Kumar ◽  
R. Krishna Murthy Karuturi ◽  
Joshy George

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.


Author(s):  
Massimo Andreatta ◽  
Santiago J Carmona

Abstract Summary STACAS is a computational method for the identification of integration anchors in the Seurat environment, optimized for the integration of single-cell (sc) RNA-seq datasets that share only a subset of cell types. We demonstrate that by (i) correcting batch effects while preserving relevant biological variability across datasets, (ii) filtering aberrant integration anchors with a quantitative distance measure and (iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. Availability and implementation Source code and R package available at https://github.com/carmonalab/STACAS; Docker image available at https://hub.docker.com/repository/docker/mandrea1/stacas_demo.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Mehmet Alican Noyan ◽  
Murat Durdu ◽  
Ali Haydar Eskiocak

Abstract Tzanck smear test is a low-cost, rapid and reliable tool which can be used for the diagnosis of many erosive-vesiculobullous, tumoral and granulomatous diseases. Currently its use is limited mainly due to lack of experience in interpretation of the smears. We developed a deep learning model, TzanckNet, that can identify cells in Tzanck smear test findings. TzanckNet was trained on a retrospective development dataset of 2260 Tzanck smear images collected between December 2006 and December 2019. The finalized model was evaluated using a prospective validation dataset of 359 Tzanck smear images collected from 15 patients during January 2020. It is designed to recognize six cell types (acantholytic cells, eosinophils, hypha, multinucleated giant cells, normal keratinocytes and tadpole cells). For 359 images and 6 cell types, TzanckNet made 2154 predictions. The accuracy was 94.3% (95% CI 93.4–95.3), the sensitivity was 83.7% (95% CI 80.3–87.0) and the specificity was 97.3% (95% CI 96.5–98.1). The area under the receiver operating characteristic curve was 0.974. Our results show that TzanckNet has the potential to lower the experience barrier needed to use this test, broadening its user base, and hence improving patient well-being.


2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Djihad Hadjadj ◽  
Thomas Denecker ◽  
Eva Guérin ◽  
Su-Jung Kim ◽  
Fabien Fauchereau ◽  
...  

Abstract DNA replication must be faithful and follow a well-defined spatiotemporal program closely linked to transcriptional activity, epigenomic marks, intranuclear structures, mutation rate and cell fate determination. Among the readouts of the spatiotemporal program of DNA replication, replication timing analyses require not only complex and time-consuming experimental procedures, but also skills in bioinformatics. We developed a dedicated Shiny interactive web application, the START-R (Simple Tool for the Analysis of the Replication Timing based on R) suite, which analyzes DNA replication timing in a given organism with high-throughput data. It reduces the time required for generating and analyzing simultaneously data from several samples. It automatically detects different types of timing regions and identifies significant differences between two experimental conditions in ∼15 min. In conclusion, START-R suite allows quick, efficient and easier analyses of DNA replication timing for all organisms. This novel approach can be used by every biologist. It is now simpler to use this method in order to understand, for example, whether ‘a favorite gene or protein’ has an impact on replication process or, indirectly, on genomic organization (as Hi-C experiments), by comparing the replication timing profiles between wild-type and mutant cell lines.


1988 ◽  
Vol 168 (4) ◽  
pp. 1493-1498 ◽  
Author(s):  
R L Barker ◽  
G J Gleich ◽  
L R Pease

Eosinophil granule major basic protein (MBP), a potent toxin for helminths and various cell types, is a 13.8-kD single polypeptide rich in arginine with a calculated isoelectric point (pI) of 10.9. A cDNA for human MBP was isolated from a gamma GT10 HL-60 cDNA library. The nucleotide sequence of the MBP cDNA indicates that MBP is translated as a 25.2-kD preproprotein. The 9.9-kD pro-portion of proMBP is rich in glutamic and aspartic acids and has a calculated pI of 3.9, while proMBP itself has a calculated pI of 6.2. We suggest that MBP is translated as a nontoxic precursor that protects the eosinophil from damage while the protein is processed through the endoplasmic reticulum to its sequestered site in the granule core toxic MBP, and we present results from the literature suggesting that other cationic toxins, which damage cell membranes, may also be processed from nontoxic precursors containing distinct anionic and cationic regions.


Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 781-781
Author(s):  
Alex Ebralidze ◽  
Pu Zhang ◽  
Frank Rosenbauer ◽  
Gang Huang ◽  
Ulrich Steidl ◽  
...  

Abstract The transcription factor PU.1 is an important regulator of hematopoiesis and correct expression levels in specific lineages are critical for normal hematopoietic development. Specifically, PU.1 is maintained or upregulated in specific lineages, and failure to downregulate PU.1 in other lineages can lead to a block in development of that lineage and/or leukemia. In vivo expression of PU.1 is dependent on an upstream regulatory element called the URE. Disruption of the URE leads to downregulation of PU.1 and development of leukemia and lymphoma, but the other distal elements regulating PU.1 have not been defined. Here we show that other phylogenetically conserved elements participate in the initiation of antisense transcription, and that these antisense RNAs function as important modulators of proper dosages of PU.1. Specifically, antisense transcripts originate from specific conserved sites in introns 1 and 3, and that the intron 3 site contains binding sites for transcription factors such as AML1 and Ets factors. The conserved intron 3 element also possesses anti-sense promoter activity. These antisense transcripts are present at about 15% of PU.1 sense transcripts in PU.1 expressing cells. They negatively regulate PU.1 sense RNA, as introduction of siRNA molecules which specifically target antisense transcripts lead to 3–5 fold increases in PU.1 sense RNA and protein. Both sense and antisense PU.1 gene RNAs are dependent on the URE and are transcribed from the same chromatin architecture, in which the conserved elements, including URE and sense and antisense promoters are located in the same nuclear fraction and can be shown to exist in the same nucleoprotein complex by chromosome conformation capture (3C). We are currently testing the mechanisms involved in the formation of such complexes, and specifically whether the complexes are mediated by binding of AML1 to the URE. Since we do not observe significant differences in antisense transcript levels between PU.1 high-expressing and PU.1 low-expressing cells, we hypothesize that the function of these antisense transcripts is to modulate rather than absolutely control PU.1 levels: in PU.1 high-expressing cells, such as myeloid cells, the antisense transcripts trim PU.1 levels to prevent overexpression, while in cell types in which PU.1 is not expressed, such as T cells, the antisense transcripts prevent any expression of PU.1. We propose that such a mechanism will likely be important in fine-tuning the regulation of many genes and may be the reason for the large number of overlapping complementary transcripts with so far unknown function.


1992 ◽  
Vol 119 (2) ◽  
pp. 367-377 ◽  
Author(s):  
J A Theriot ◽  
T J Mitchison

We have investigated the dynamic behavior of actin in fibroblast lamellipodia using photoactivation of fluorescence. Activated regions of caged resorufin (CR)-labeled actin in lamellipodia of IMR 90 and MC7 3T3 fibroblasts were observed to move centripetally over time. Thus in these cells, actin filaments move centripetally relative to the substrate. Rates were characteristic for each cell type; 0.66 +/- 0.27 microns/min in IMR 90 and 0.36 +/- 0.16 microns/min in MC7 3T3 cells. In neither case was there any correlation between the rate of actin movement and the rate of lamellipodial protrusion. The half-life of the activated CR-actin filaments was approximately 1 min in IMR 90 lamellipodia, and approximately 3 min in MC7 3T3 lamellipodia. Thus continuous filament turnover accompanies centripetal movement. In both cell types, the length of time required for a section of the actin meshwork to traverse the lamellipodium was several times longer than the filament half-life. The dynamic behavior of the dorsal surface of the cell was also observed by tracking lectin-coated beads on the surface and phase-dense features within lamellipodia of MC7 3T3 cells. The movement of these dorsal features occurred at rates approximately three times faster than the rate of movement of the underlying bulk actin cytoskeleton, even when measured in the same individual cells. Thus the transport of these dorsal features must occur by some mechanism other than simple attachment to the moving bulk actin cytoskeleton.


1987 ◽  
Vol 166 (2) ◽  
pp. 341-361 ◽  
Author(s):  
C Transy ◽  
S R Nash ◽  
B David-Watine ◽  
M Cochet ◽  
S W Hunt ◽  
...  

We have previously described the isolation of pH-2d-37, a cDNA clone that encodes a so far unknown, poorly polymorphic, class I surface molecule. We report here the isolation of the corresponding gene, its nucleotide sequence, and its localization in the Tla region of the murine MHC. Using a RNase mapping assay, we have confirmed that the second domain coding region of the 37 gene displays very limited polymorphism, and that the gene is transcribed in a broad variety of cell types, in contrast to the genes encoding the known Qa and TL antigens. Possible functions are discussed.


Sign in / Sign up

Export Citation Format

Share Document