Evaluation and comparison of methods for recapitulation of 3D spatial chromatin structures

Jincheol Park; Shili Lin

doi:10.1093/bib/bbx134

Evaluation and comparison of methods for recapitulation of 3D spatial chromatin structures

Briefings in Bioinformatics ◽

10.1093/bib/bbx134 ◽

2017 ◽

Vol 20 (4) ◽

pp. 1205-1214

Author(s):

Jincheol Park ◽

Shili Lin

Keyword(s):

3D Structure ◽

Three Dimensional ◽

Simulated Data ◽

Real Data ◽

Data Sets ◽

Contact Map ◽

Chromosome Conformation ◽

Contact Maps ◽

Data Resolution ◽

Genomic Scale

Abstract How chromosomes fold and how distal genomic elements interact with one another at a genomic scale have been actively pursued in the past decade following the seminal work describing the Chromosome Conformation Capture (3C) assay. Essentially, 3C-based technologies produce two-dimensional (2D) contact maps that capture interactions between genomic fragments. Accordingly, a plethora of analytical methods have been proposed to take a 2D contact map as input to recapitulate the underlying whole genome three-dimensional (3D) structure of the chromatin. However, their performance in terms of several factors, including data resolution and ability to handle contact map features, have not been sufficiently evaluated. This task is taken up in this article, in which we consider several recent and/or well-regarded methods, both optimization-based and model-based, for their aptness of producing 3D structures using contact maps generated based on a population of cells. These methods are evaluated and compared using both simulated and real data. Several criteria have been used. For simulated data sets, the focus is on accurate recapitulation of the entire structure given the existence of the gold standard. For real data sets, comparison with distances measured by Florescence in situ Hybridization and consistency with several genomic features of known biological functions are examined.

Download Full-text

GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs

10.1101/181842 ◽

2017 ◽

Cited By ~ 4

Author(s):

Oana Ursu ◽

Nathan Boley ◽

Maryna Taranova ◽

Y.X. Rachel Wang ◽

Galip Gurkan Yardimci ◽

...

Keyword(s):

Random Walks ◽

Critical Role ◽

Three Dimensional ◽

Cell Types ◽

Chromosome Conformation Capture ◽

Supplementary Information ◽

Contact Map ◽

Chromosome Conformation ◽

Contact Maps ◽

Map Graphs

AbstractMotivationThe three-dimensional organization of chromatin plays a critical role in gene regulation and disease. High-throughput chromosome conformation capture experiments such as Hi-C are used to obtain genome-wide maps of 3D chromatin contacts. However, robust estimation of data quality and systematic comparison of these contact maps is challenging due to the multi-scale, hierarchical structure of chromatin contacts and the resulting properties of experimental noise in the data. Measuring concordance of contact maps is important for assessing reproducibility of replicate experiments and for modeling variation between different cellular contexts.ResultsWe introduce a concordance measure called GenomeDISCO (DIfferences between Smoothed COntact maps) for assessing the similarity of a pair of contact maps obtained from chromosome conformation capture experiments. The key idea is to smooth contact maps using random walks on the contact map graph, before estimating concordance. We use simulated datasets to benchmark GenomeDISCO’s sensitivity to different types of noise that affect chromatin contact maps. When applied to a large collection of Hi-C datasets, GenomeDISCO accurately distinguishes biological replicates from samples obtained from different cell types. GenomeDISCO also generalizes to other chromosome conformation capture assays, such as HiChIP.AvailabilitySoftware implementing GenomeDISCO is available at https://github.com/kundajelab/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

HiC-GNN: A Generalizable Model for 3D Chromosome Reconstruction Using Graph Convolutional Neural Networks

10.1101/2021.11.29.470405 ◽

2021 ◽

Author(s):

Van Hovenga ◽

Oluwatosin Oluwadare ◽

Jugal Kalita

Keyword(s):

Structure Prediction ◽

3D Structure ◽

Three Dimensional ◽

Restriction Enzymes ◽

Superior Performance ◽

Data Sets ◽

Reconstruction Accuracy ◽

Chromosome Conformation ◽

Contact Data ◽

Multiple Cell

Chromosome conformation capture (3C) is a method of measuring chromosome topology in terms of loci interaction. The Hi-C method is a derivative of 3C that allows for genome wide quantification of chromosome interaction. From such interaction data, it is possible to infer the three-dimensional (3D) structure of the underlying chromosome. In this paper, we use a node embedding algorithm and a graph neural network to predict the 3D coordinates of each genomic loci from the corresponding Hi-C contact data. Unlike other chromosome structure prediction methods, our method can generalize a single model across Hi-C resolutions, multiple restriction enzymes, and multiple cell populations while maintaining reconstruction accuracy. We derive these results using three separate Hi-C data sets from the GM12878, GM06990, and K562 cell lines. We also compare the reconstruction accuracy of our method to four other existing methods and show that our method yields superior performance. Our algorithm outperforms the state-of-the-art methods in the accuracy of prediction and introduces a novel method for 3D structure prediction from Hi-C data.

Download Full-text

Simultaneous Robot–World and Hand–Eye Calibration without a Calibration Object

Sensors ◽

10.3390/s18113949 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3949 ◽

Cited By ~ 3

Author(s):

Wei Li ◽

Mingli Dong ◽

Naiguang Lu ◽

Xiaoping Lou ◽

Peng Sun

Keyword(s):

Three Dimensional ◽

Real Data ◽

Medical Robotics ◽

Calibration Method ◽

Bundle Adjustment ◽

Data Sets ◽

Rigid Transformation ◽

Validation Experiment ◽

Dimensional Reconstruction ◽

Sparse Bundle Adjustment

An extended robot–world and hand–eye calibration method is proposed in this paper to evaluate the transformation relationship between the camera and robot device. This approach could be performed for mobile or medical robotics applications, where precise, expensive, or unsterile calibration objects, or enough movement space, cannot be made available at the work site. Firstly, a mathematical model is established to formulate the robot-gripper-to-camera rigid transformation and robot-base-to-world rigid transformation using the Kronecker product. Subsequently, a sparse bundle adjustment is introduced for the optimization of robot–world and hand–eye calibration, as well as reconstruction results. Finally, a validation experiment including two kinds of real data sets is designed to demonstrate the effectiveness and accuracy of the proposed approach. The translation relative error of rigid transformation is less than 8/10,000 by a Denso robot in a movement range of 1.3 m × 1.3 m × 1.2 m. The distance measurement mean error after three-dimensional reconstruction is 0.13 mm.

Download Full-text

A Note on the Properties of Generalised Separable Spatial Autoregressive Process

Journal of Probability and Statistics ◽

10.1155/2009/847830 ◽

2009 ◽

Vol 2009 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Mahendran Shitan ◽

Shelton Peiris

Keyword(s):

Three Dimensional ◽

Real Data ◽

Autoregressive Process ◽

Spatial Modelling ◽

Spectral Density Function ◽

Data Sets ◽

Additional Parameter ◽

New Class ◽

Spatial Autoregressive ◽

Modelling And Forecasting

Spatial modelling has its applications in many fields like geology, agriculture, meteorology, geography, and so forth. In time series a class of models known as Generalised Autoregressive (GAR) has been introduced by Peiris (2003) that includes an index parameterδ. It has been shown that the inclusion of this additional parameter aids in modelling and forecasting many real data sets. This paper studies the properties of a new class of spatial autoregressive process of order 1 with an index. We will call this aGeneralised Separable Spatial Autoregressive(GENSSAR) Model. The spectral density function (SDF), the autocovariance function (ACVF), and the autocorrelation function (ACF) are derived. The theoretical ACF and SDF plots are presented as three-dimensional figures.

Download Full-text

A Growth Model for Multilevel Ordinal Data

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986030004369 ◽

2005 ◽

Vol 30 (4) ◽

pp. 369-396 ◽

Cited By ~ 8

Author(s):

Eisuke Segawa

Keyword(s):

Latent Variable ◽

Ordinal Data ◽

Linear Models ◽

Growth Models ◽

Simulated Data ◽

Real Data ◽

Analytic Structure ◽

Data Sets ◽

Data Set ◽

Time Points

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.

Download Full-text

Data mining and machine learning methods for chromosome conformation data analysis

10.32469/10355/70076 ◽

2019 ◽

Author(s):

◽

Oluwatosin Oluwadare

Keyword(s):

Data Analysis ◽

Human Genome ◽

Structure Prediction ◽

Learning Algorithm ◽

3D Structure ◽

Three Dimensional ◽

Gene Clusters ◽

Research Field ◽

Chromosome Conformation ◽

Graphical Tool

Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.

Download Full-text

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008865 ◽

2021 ◽

Vol 17 (3) ◽

pp. e1008865

Author(s):

Yang Li ◽

Chengxin Zhang ◽

Eric W. Bell ◽

Wei Zheng ◽

Xiaogen Zhou ◽

...

Keyword(s):

Long Range ◽

3D Structure ◽

High Accuracy ◽

Large Set ◽

Contact Map ◽

Homologous Proteins ◽

Convolutional Networks ◽

Contact Maps ◽

Model Training ◽

Protein Contact Maps

The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.

Download Full-text

JL-GFDN: A Novel Gabor Filter-Based Deep Network Using Joint Spectral-Spatial Local Binary Pattern for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs12122016 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2016 ◽

Cited By ~ 2

Author(s):

Tao Zhang ◽

Puzhao Zhang ◽

Weilin Zhong ◽

Zhen Yang ◽

Fan Yang

Keyword(s):

Local Binary Pattern ◽

Hyperspectral Image ◽

Gabor Filter ◽

Spectral Characteristics ◽

Three Dimensional ◽

Real Data ◽

Hyperspectral Data ◽

Data Cube ◽

Data Sets ◽

Deep Network

The traditional local binary pattern (LBP, hereinafter we also call it a two-dimensional local binary pattern 2D-LBP) is unable to depict the spectral characteristics of a hyperspectral image (HSI). To cure this deficiency, this paper develops a joint spectral-spatial 2D-LBP feature (J2D-LBP) by averaging three different 2D-LBP features in a three-dimensional hyperspectral data cube. Subsequently, J2D-LBP is added into the Gabor filter-based deep network (GFDN), and then a novel classification method JL-GFDN is proposed. Different from the original GFDN framework, JL-GFDN further fuses the spectral and spatial features together for HSI classification. Three real data sets are adopted to evaluate the effectiveness of JL-GFDN, and the experimental results verify that (i) JL-GFDN has a better classification accuracy than the original GFDN; (ii) J2D-LBP is more effective in HSI classification in comparison with the traditional 2D-LBP.

Download Full-text

Selfish: discovery of differential chromatin interactions via a self-similarity measure

Bioinformatics ◽

10.1093/bioinformatics/btz362 ◽

2019 ◽

Vol 35 (14) ◽

pp. i145-i153 ◽

Cited By ~ 11

Author(s):

Abbas Roayaei Ardakany ◽

Ferhat Ay ◽

Stefano Lonardi

Keyword(s):

Comparative Analysis ◽

Similarity Measure ◽

Fundamental Problem ◽

Three Dimensional ◽

Biological Significance ◽

Real Data ◽

Dimensional Structure ◽

Self Similarity ◽

Contact Maps ◽

Chromatin Interactions

AbstractMotivationHigh-throughput conformation capture experiments, such as Hi-C provide genome-wide maps of chromatin interactions, enabling life scientists to investigate the role of the three-dimensional structure of genomes in gene regulation and other essential cellular functions. A fundamental problem in the analysis of Hi-C data is how to compare two contact maps derived from Hi-C experiments. Detecting similarities and differences between contact maps are critical in evaluating the reproducibility of replicate experiments and for identifying differential genomic regions with biological significance. Due to the complexity of chromatin conformations and the presence of technology-driven and sequence-specific biases, the comparative analysis of Hi-C data is analytically and computationally challenging.ResultsWe present a novel method called Selfish for the comparative analysis of Hi-C data that takes advantage of the structural self-similarity in contact maps. We define a novel self-similarity measure to design algorithms for (i) measuring reproducibility for Hi-C replicate experiments and (ii) finding differential chromatin interactions between two contact maps. Extensive experimental results on simulated and real data show that Selfish is more accurate and robust than state-of-the-art methods.Availability and implementationhttps://github.com/ucrbioinfo/Selfish

Download Full-text

Data mining in manufacturing: Significance analysis of process parameters

Proceedings of the Institution of Mechanical Engineers Part B Journal of Engineering Manufacture ◽

10.1243/09544054jem1182 ◽

2008 ◽

Vol 222 (11) ◽

pp. 1503-1516 ◽

Cited By ~ 18

Author(s):

M Perzyk ◽

R Biernacki ◽

J Kozlowski

Keyword(s):

Data Mining ◽

Neural Networks ◽

Process Parameters ◽

Regression Models ◽

Simulated Data ◽

Real Data ◽

Support Vector ◽

Data Sets ◽

Interaction Factors

Determination of the most significant manufacturing process parameters using collected past data can be very helpful in solving important industrial problems, such as the detection of root causes of deteriorating product quality, the selection of the most efficient parameters to control the process, and the prediction of breakdowns of machines, equipment, etc. A methodology of determination of relative significances of process variables and possible interactions between them, based on interrogations of generalized regression models, is proposed and tested. The performance of several types of data mining tool, such as artificial neural networks, support vector machines, regression trees, classification trees, and a naïve Bayesian classifier, is compared. Also, some simple non-parametric statistical methods, based on an analysis of variance (ANOVA) and contingency tables, are evaluated for comparison purposes. The tests were performed using simulated data sets, with assumed hidden relationships, as well as on real data collected in the foundry industry. It was found that the performance of significance and interaction factors obtained from regression models, and, in particular, neural networks, is satisfactory, while the other methods appeared to be less accurate and/or less reliable.

Download Full-text