Similarity Detection for Higher-Order Structure of DNA Sequences

Author(s):  
Nguyen Thi Ngoc Anh ◽  
Ho Phan Hieu ◽  
Tran Anh Kiet ◽  
Vo Trung Hung

With the advances in data collection and storage capabilities, large amount of multidimensional dataset, known as higher-order data representation, has been generated on bioinformatics applications recently, especially in DNA sequences recognition. This paper thus proposes a mathematical modeling could be capable of the multidimensional problem of DNA similarity detection with high accuracy and reliability. To this end, the paper covers the central issues of multidimensional DNA gene expression data, including: (1) formulating multidimensional DNA data into higher-order representation; (2) recovering missing values; (3) decomposing high-order DNA data directly from their tensorial representation to extracted useful information for classification. Consequently, an exploring a novel type of third-order microarray expression, termed as gene - sample - time (GST), is presented for biological sample classification. The contributions will be distributed along two main thrusts of effectiveness; including latent modeling setting for imputing missing values based on the High-Order Kalman Filter and feature extraction based on Tensor Discriminative Feature Extraction. The experimental performance on real dataset of DNA sequences corroborates the advantages of the proposed approaches upon those of the matrix-based algorithms and recent tensor-based, discriminant-decomposition, in terms of missing values completion, classification accuracy and computation time.

2013 ◽  
Vol 24 (18) ◽  
pp. 2807-2819 ◽  
Author(s):  
Laura S. Burrack ◽  
Shelly E. Applen Clancey ◽  
Jeremy M. Chacón ◽  
Melissa K. Gardner ◽  
Judith Berman

The establishment and maintenance of higher-order structure at centromeres is essential for accurate chromosome segregation. The monopolin complex is thought to cross-link multiple kinetochore complexes to prevent merotelic attachments that result in chromosome missegregation. This model is based on structural analysis and the requirement that monopolin execute mitotic and meiotic chromosome segregation in Schizosaccharomyces pombe, which has more than one kinetochore–microtubule attachment/centromere, and co-orient sister chromatids in meiosis I in Saccharomyces cerevisiae. Recent data from S. pombe suggest an alternative possibility: that the recruitment of condensin is the primary function of monopolin. Here we test these models using the yeast Candida albicans. C. albicans cells lacking monopolin exhibit defects in chromosome segregation, increased distance between centromeres, and decreased stability of several types of repeat DNA. Of note, changing kinetochore–microtubule copy number from one to more than one kinetochore–microtubule/centromere does not alter the requirement for monopolin. Furthermore, monopolin recruits condensin to C. albicans centromeres, and overexpression of condensin suppresses chromosome segregation defects in strains lacking monopolin. We propose that the key function of monopolin is to recruit condensin in order to promote the assembly of higher-order structure at centromere and repetitive DNA.


2019 ◽  
Author(s):  
Zacharias Kinney ◽  
Viraj Kirinda ◽  
Scott Hartley

<p>Higher-order structure in abiotic foldamer systems represents an important but largely unrealized goal. As one approach to this challenge, covalent assembly can be used to assemble macrocycles with foldamer subunits in well-defined spatial relationships. Such systems have previously been shown to exhibit self-sorting, new folding motifs, and dynamic stereoisomerism, yet there remain important questions about the interplay between folding and macrocyclization and the effect of structural confinement on folding behavior. Here, we explore the dynamic covalent assembly of extended <i>ortho</i>-phenylenes (hexamer and decamer) with rod-shaped linkers. Characteristic <sup>1</sup>H chemical shift differences between cyclic and acyclic systems can be compared with computational conformer libraries to determine the folding states of the macrocycles. We show that the bite angle provides a measure of the fit of an <i>o</i>-phenylene conformer within a shape-persistent macrocycle, affecting both assembly and ultimate folding behavior. For the <i>o</i>-phenylene hexamer, the bite angle and conformer stability work synergistically to direct assembly toward triangular [3+3] macrocycles of well-folded oligomers. For the decamer, the energetic accessibility of conformers with small bite angles allows [2+2] macrocycles to be formed as the predominant species. In these systems, the <i>o</i>-phenylenes are forced into unusual folding states, preferentially adopting a backbone geometry with distinct helical blocks of opposite handedness. The results show that simple geometric restrictions can be used to direct foldamers toward increasingly complex geometries.</p>


2019 ◽  
Author(s):  
Zacharias Kinney ◽  
Viraj Kirinda ◽  
Scott Hartley

<p>Higher-order structure in abiotic foldamer systems represents an important but largely unrealized goal. As one approach to this challenge, covalent assembly can be used to assemble macrocycles with foldamer subunits in well-defined spatial relationships. Such systems have previously been shown to exhibit self-sorting, new folding motifs, and dynamic stereoisomerism, yet there remain important questions about the interplay between folding and macrocyclization and the effect of structural confinement on folding behavior. Here, we explore the dynamic covalent assembly of extended <i>ortho</i>-phenylenes (hexamer and decamer) with rod-shaped linkers. Characteristic <sup>1</sup>H chemical shift differences between cyclic and acyclic systems can be compared with computational conformer libraries to determine the folding states of the macrocycles. We show that the bite angle provides a measure of the fit of an <i>o</i>-phenylene conformer within a shape-persistent macrocycle, affecting both assembly and ultimate folding behavior. For the <i>o</i>-phenylene hexamer, the bite angle and conformer stability work synergistically to direct assembly toward triangular [3+3] macrocycles of well-folded oligomers. For the decamer, the energetic accessibility of conformers with small bite angles allows [2+2] macrocycles to be formed as the predominant species. In these systems, the <i>o</i>-phenylenes are forced into unusual folding states, preferentially adopting a backbone geometry with distinct helical blocks of opposite handedness. The results show that simple geometric restrictions can be used to direct foldamers toward increasingly complex geometries.</p>


2019 ◽  
Vol 26 (1) ◽  
pp. 35-43 ◽  
Author(s):  
Natalie K. Garcia ◽  
Galahad Deperalta ◽  
Aaron T. Wecksler

Background: Biotherapeutics, particularly monoclonal antibodies (mAbs), are a maturing class of drugs capable of treating a wide range of diseases. Therapeutic function and solutionstability are linked to the proper three-dimensional organization of the primary sequence into Higher Order Structure (HOS) as well as the timescales of protein motions (dynamics). Methods that directly monitor protein HOS and dynamics are important for mapping therapeutically relevant protein-protein interactions and assessing properly folded structures. Irreversible covalent protein footprinting Mass Spectrometry (MS) tools, such as site-specific amino acid labeling and hydroxyl radical footprinting are analytical techniques capable of monitoring the side chain solvent accessibility influenced by tertiary and quaternary structure. Here we discuss the methodology, examples of biotherapeutic applications, and the future directions of irreversible covalent protein footprinting MS in biotherapeutic research and development. Conclusion: Bottom-up mass spectrometry using irreversible labeling techniques provide valuable information for characterizing solution-phase protein structure. Examples range from epitope mapping and protein-ligand interactions, to probing challenging structures of membrane proteins. By paring these techniques with hydrogen-deuterium exchange, spectroscopic analysis, or static-phase structural data such as crystallography or electron microscopy, a comprehensive understanding of protein structure can be obtained.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i857-i865
Author(s):  
Derrick Blakely ◽  
Eamon Collins ◽  
Ritambhara Singh ◽  
Andrew Norton ◽  
Jack Lanchantin ◽  
...  

Abstract Motivation Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size. Results In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Availability and implementation Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
pp. 1-9
Author(s):  
Ivar Snorrason ◽  
Courtney Beard ◽  
Andrew D. Peckham ◽  
Thröstur Björgvinsson

Abstract Background Hierarchical structural models of psychopathology rarely extend to obsessive-compulsive spectrum disorders. The current study sought to examine the higher-order structure of the obsessive-compulsive and related disorders (OCRDs) in DSM-5: obsessive-compulsive disorder (OCD), hoarding disorder (HD), body dysmorphic disorder (BDD), trichotillomania (hair-pulling disorder; HPD) and excoriation (skin-picking) disorder (SPD). Methods Adult patients in a partial hospital program (N = 532) completed a dimensional measure of the five OCRDs. We used confirmatory factor analysis to identify the optimal model of the comorbidity structure. We then examined the associations between the transdiagnostic factors and internalizing and externalizing symptoms (i.e. depression, generalized anxiety, neuroticism, and drug/alcohol cravings). Results The best fitting model included two correlated higher-order factors: an obsessions-compulsions (OC) factor (OCD, BDD, and HD), and a body-focused repetitive behavior (BFRB) factor (HPD and SPD). The OC factor, not the BFRB factor, had unique associations with internalizing symptoms (standardized effects = 0.42–0.66) and the BFRB factor, not the OC factor, had small marginally significant unique association with drug/alcohol cravings (standardized effect = 0.22, p = 0.088). Conclusions The results mirror findings from twin research and indicate that OCD, BDD, and HD share liability that is significantly associated with internalizing symptoms, but this liability may be relatively less important for BFRBs. Further research is needed to better examine the associations between BFRBs and addictive disorders.


2001 ◽  
Vol 09 (04) ◽  
pp. 1259-1286 ◽  
Author(s):  
MIGUEL R. VISBAL ◽  
DATTA V. GAITONDE

A high-order compact-differencing and filtering algorithm, coupled with the classical fourth-order Runge–Kutta scheme, is developed and implemented to simulate aeroacoustic phenomena on curvilinear geometries. Several issues pertinent to the use of such schemes are addressed. The impact of mesh stretching in the generation of high-frequency spurious modes is examined and the need for a discriminating higher-order filter procedure is established and resolved. The incorporation of these filtering techniques also permits a robust treatment of outflow radiation condition by taking advantage of energy transfer to high-frequencies caused by rapid mesh stretching. For conditions on the scatterer, higher-order one-sided filter treatments are shown to be superior in terms of accuracy and stability compared to standard explicit variations. Computations demonstrate that these algorithmic components are also crucial to the success of interface treatments created in multi-domain and domain-decomposition strategies. For three-dimensional computations, special metric relations are employed to assure the fidelity of the scheme in highly curvilinear meshes. A variety of problems, including several benchmark computations, demonstrate the success of the overall computational strategy.


Sign in / Sign up

Export Citation Format

Share Document