Short exon prediction based on multiscale products of a genomic-inspired multiscale bilateral filtering

Mapping Intimacies ◽

10.1101/423053 ◽

2018 ◽

Author(s):

Xiaolei Zhang ◽

Weijun Pan

Keyword(s):

Dna Sequences ◽

Data Sets ◽

Bilateral Filtering ◽

Short Exon ◽

Exon Prediction ◽

High Prediction ◽

The Difference ◽

Energy Compaction ◽

Processing Techniques ◽

Weighting Coefficients

ABSTRACTMultiscale signal processing techniques such as wavelet filtering have proved to be particularly successful in predicting exon sequences. Traditional wavelet predictor is domain filtering, and enforces exon features by weighting nucleotide values with coefficients. Such a measure performs linear filtering and is not suitable for preserving the short coding exons and the exon-intron boundaries. This paper describes a short exon prediction framework that is capable of non-linearly processing DNA sequences while achieving high prediction rates. There are two key contributions. The first is the introduction of a genomic-inspired multiscale bilateral filtering (MSBF) which exploits both weighting coefficients in the spatial domain and nucleotide similarity in the range. Similarly to wavelet transform, the MSBF is also defined as a weighted sum of nucleotides. The difference is that the MSBF takes into account the variation of nucleotides at a specific codon position. The second contribution is the exploitation of inter-scale correlation in MSBF domain to find the inter-scale dependency on the differences between the exon signal and the background noise. This favourite property is used to sharp the important structures while weakening noise. Three benchmark data sets have been used in the evaluation of considered methods. By comparison with two existing techniques, the prediction results demonstrate that: the proposed method reveals at least improvement of 50.5%, 36.7%, 12.8%, 17.8%, 17.7%, 11.5% and 12.2% on the exons length of 1-49, 50-74, 75-99, 100-124, 125-149, 150-174 and 175-199, respectively. The MSBF of its nonlinear nature is good at energy compaction, which makes it capable of locating the sharp variations around short exons. The direct scale multiplication of coefficients at several adjacent scales obviously enhanced exon features while the noise contents were suppressed. We show that the non-linear nature and correlation-based property achieved in proposed predictor is greater than that for traditional filtering, which leads to better exon prediction performance. There are some possible applications of this predictor. Its good localization and protection of sharp variations will make the predictor be suitable to perform fault diagnosis of aero-engine.

Download Full-text

Efficient Adaptive Exon Prediction for DNA study using Proportionate LMS Variants

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.17.11721 ◽

2018 ◽

Vol 7 (2.17) ◽

pp. 116 ◽

Cited By ~ 1

Author(s):

Srinivasareddy Putluri ◽

Md Zia Ur Rahman

Keyword(s):

Signal Processing ◽

Dna Sequences ◽

Genomic Sequence ◽

Adaptive Signal Processing ◽

Protein Coding ◽

Exon Prediction ◽

Gene Database ◽

Processing Techniques ◽

Adaptive Signal ◽

Normalized Lms

In the field of Bio-informatics, locating the exon fragments in a deoxyribonucleic acid (DNA) sequence is an important and vital work. Study of protein coding regions is a wide phenomenon in identification of diseases and design of drugs. The regions of DNA that have the protein coding information are termed as exons. Hence identifying the exon segments in a genomic sequence is a crucial job in bio-informatics. Three base periodicity (TBP) has been observed in the regions of DNA sequences can be easily determined by applying signal processing methods. Adaptive signal processing techniques found to be useful than other available methods. This is due to their unique capability to alter weight coefficients based on genomic sequence. We propose efficient adaptive exon predictors (AEPs) based on these considerations using Proportionate Normalized LMS (PNLMS) algorithm and Maximum Proportionate Normalized LMS (MPNLMS) algorithm to improve exon locating ability and better convergence. To ease the complexity of computations in the denominator during filtering process, proposed AEPs using PNLMS and its maximum variants are combined with signature algorithms. Hybrid variants of proposed AEPs include PNLMS, DCPNLMS, ECPNLMS, SSPNLMS, MPNLMS, MDCPNLMS, MECPNLMS and MSSPNLMS algorithms. It was shown that the AEP based on MDCPNLMS is superior in applications of exon identification depending on performance measures with Sensitivity 0.7346, Specificity 0.7483 and precision 0.7325 for a genomic sequence with accession AF009962 at a threshold of 0.8. Finally the capability of several AEPs in predicting exon locations is verified using different DNA sequences found in National Center for Biotechnology Information (NCBI) gene database.

Download Full-text

The Age of Nonsynonymous and Synonymous Mutations in Animal mtDNA and Implications for the Mildly Deleterious Theory

Genetics ◽

10.1093/genetics/153.1.497 ◽

1999 ◽

Vol 153 (1) ◽

pp. 497-506 ◽

Cited By ~ 4

Author(s):

Rasmus Nielsen ◽

Daniel M Weinreich

Keyword(s):

Dna Sequences ◽

Purifying Selection ◽

Data Sets ◽

Deleterious Mutations ◽

Synonymous Mutations ◽

Weak Evidence ◽

Mitochondrial Data ◽

The Mean ◽

Excess Number ◽

Neutral Mutations

Abstract McDonald/Kreitman tests performed on animal mtDNA consistently reveal significant deviations from strict neutrality in the direction of an excess number of polymorphic nonsynonymous sites, which is consistent with purifying selection acting on nonsynonymous sites. We show that under models of recurrent neutral and deleterious mutations, the mean age of segregating neutral mutations is greater than the mean age of segregating selected mutations, even in the absence of recombination. We develop a test of the hypothesis that the mean age of segregating synonymous mutations equals the mean age of segregating nonsynonymous mutations in a sample of DNA sequences. The power of this age-of-mutation test and the power of the McDonald/Kreitman test are explored by computer simulations. We apply the new test to 25 previously published mitochondrial data sets and find weak evidence for selection against nonsynonymous mutations.

Download Full-text

POSTPARTUM AMENORRHOEA IN RURAL EASTERN UTTAR PRADESH, INDIA

Journal of Biosocial Science ◽

10.1017/s0021932098002272 ◽

1998 ◽

Vol 30 (2) ◽

pp. 227-243

Author(s):

K. N. S. YADAVA ◽

S. K. JAIN

Keyword(s):

Higher Education ◽

Uttar Pradesh ◽

Current Status Data ◽

North India ◽

Current Status ◽

Data Sets ◽

Survival Status ◽

The Mean ◽

The Difference ◽

Using Data

This paper calculates the mean duration of the postpartum amenorrhoea (PPA) and examines its demographic, and socioeconomic correlates in rural north India, using data collected through 'retrospective' (last but one child) as well as 'current status' (last child) reporting of the duration of PPA.The mean duration of PPA was higher in the current status than in the retrospective data;n the difference being statistically significant. However, for the same mothers who gave PPA information in both the data sets, the difference in mean duration of PPA was not statistically significant. The correlates were identical in both the data sets. The current status data were more complete in terms of the coverage, and perhaps less distorted by reporting errors caused by recall lapse.A positive relationship of the mean duration of PPA was found with longer breast-feeding, higher parity and age of mother at the birth of the child, and the survival status of the child. An inverse relationship was found with higher education of a woman, higher education of her husband and higher socioeconomic status of her household, these variables possibly acting as proxies for women's better nutritional status.

Download Full-text

Investigating the Temporal Effect of User Preferences with Application in Movie Recommendation

Mobile Information Systems ◽

10.1155/2017/8940709 ◽

2017 ◽

Vol 2017 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Wen-Jun Li ◽

Qiang Dong ◽

Yan Fu

Keyword(s):

Rapid Development ◽

User Preferences ◽

Mobile Internet ◽

Smart Devices ◽

Data Sets ◽

Rating Data ◽

Temporal Effect ◽

Proposed Model ◽

Target User ◽

The Difference

As the rapid development of mobile Internet and smart devices, more and more online content providers begin to collect the preferences of their customers through various apps on mobile devices. These preferences could be largely reflected by the ratings on the online items with explicit scores. Both of positive and negative ratings are helpful for recommender systems to provide relevant items to a target user. Based on the empirical analysis of three real-world movie-rating data sets, we observe that users’ rating criterions change over time, and past positive and negative ratings have different influences on users’ future preferences. Given this, we propose a recommendation model on a session-based temporal graph, considering the difference of long- and short-term preferences, and the different temporal effect of positive and negative ratings. The extensive experiment results validate the significant accuracy improvement of our proposed model compared with the state-of-the-art methods.

Download Full-text

A Novel Active Contours Model for Environmental Change Detection from Multitemporal Synthetic Aperture Radar Images

Remote Sensing ◽

10.3390/rs12111746 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1746

Author(s):

Salman Ahmadi ◽

Saeid Homayouni

Keyword(s):

Synthetic Aperture Radar ◽

Change Detection ◽

Active Contours ◽

Training Data ◽

Synthetic Aperture ◽

Data Sets ◽

Difference Image ◽

Proposed Model ◽

The Difference ◽

Aperture Radar

In this paper, we propose a novel approach based on the active contours model for change detection from synthetic aperture radar (SAR) images. In order to increase the accuracy of the proposed approach, a new operator was introduced to generate a difference image from the before and after change images. Then, a new model of active contours was developed for accurately detecting changed regions from the difference image. The proposed model extracts the changed areas as a target feature from the difference image based on training data from changed and unchanged regions. In this research, we used the Otsu histogram thresholding method to produce the training data automatically. In addition, the training data were updated in the process of minimizing the energy function of the model. To evaluate the accuracy of the model, we applied the proposed method to three benchmark SAR data sets. The proposed model obtains 84.65%, 87.07%, and 96.26% of the Kappa coefficient for Yellow River Estuary, Bern, and Ottawa sample data sets, respectively. These results demonstrated the effectiveness of the proposed approach compared to other methods. Another advantage of the proposed model is its high speed in comparison to the conventional methods.

Download Full-text

Weak factor automata: the failure of failure factor oracles?

South African Computer Journal ◽

10.18489/sacj.v53i0.199 ◽

2014 ◽

Vol 53 ◽

Author(s):

Loek Cleophas ◽

Derrick G. Kourie ◽

Bruce W. Watson

Keyword(s):

Pattern Matching ◽

Dna Sequences ◽

Finite Automata ◽

Compact Representation ◽

Data Sets ◽

Matching Algorithm ◽

Weak Factor ◽

Ex Post ◽

Ex Post Facto ◽

Matching Performance

In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.

Download Full-text

Quantitative analysis questions the role of MeCP2 as a global regulator of alternative splicing

10.1101/2020.05.25.115154 ◽

2020 ◽

Author(s):

Kashyap Chhatbar ◽

Justyna Cholewa-Waclaw ◽

Ruth Shah ◽

Adrian Bird ◽

Guido Sanguinetti

Keyword(s):

Alternative Splicing ◽

Dna Sequences ◽

Molecular Mechanisms ◽

Data Sets ◽

Learning Approaches ◽

Global Regulator ◽

Major Function ◽

Quantitative Analyses ◽

Different Levels

AbstractMeCP2 is an abundant protein in mature nerve cells, where it binds to DNA sequences containing methylated cytosine. Mutations in the MECP2 gene cause the severe neurological disorder Rett syndrome (RTT), provoking intensive study of the underlying molecular mechanisms. Multiple functions have been proposed, one of which involves a regulatory role in splicing. Here we leverage the recent availability of high-quality transcriptomic data sets to probe quantitatively the potential influence of MeCP2 on alternative splicing. Using a variety of machine learning approaches that can capture both linear and non-linear associations, we show that widely different levels of MeCP2 have a minimal effect on alternative splicing in three different systems. Alternative splicing was also apparently indifferent to developmental changes in DNA methylation levels. Our results suggest that regulation of splicing is not a major function of MeCP2. They also highlight the importance of multi-variate quantitative analyses in the formulation of biological hypotheses.

Download Full-text

Comparison of aerosol properties from the Indian Himalayas and the Indo-Gangetic plains

Atmospheric Chemistry and Physics Discussions ◽

10.5194/acpd-11-11417-2011 ◽

2011 ◽

Vol 11 (4) ◽

pp. 11417-11453 ◽

Cited By ~ 10

Author(s):

T. Raatikainen ◽

A.-P. Hyvärinen ◽

J. Hatakka ◽

T. S. Panwar ◽

R. K. Hooda ◽

...

Keyword(s):

Boundary Layer ◽

Monsoon Season ◽

Data Sets ◽

New Delhi ◽

Gangetic Plains ◽

Diurnal Cycles ◽

Measurement Site ◽

Background Measurement ◽

The Difference ◽

Boundary Layer Dynamics

Abstract. Gual Pahari is a polluted semi-urban background measurement site at the Indo-Gangetic plains close to New Delhi and Mukteshwar is a relatively clean background measurement site at the foothills of the Himalayas about 270 km NE from Gual Pahari and about 2 km above the nearby plains. Two years long data sets including aerosol and meteorological parameters as well as modeled backward trajectories and boundary layer heights were compared. The purpose was to see how aerosol concentrations vary between clean and polluted sites not very far from each other. Specifically, we were exploring the effect of boundary layer evolution on aerosol concentrations. The measurements showed that especially during the coldest winter months, aerosol concentrations are significantly lower in Mukteshwar. On the other hand, the difference is smaller and also the concentration trends are quite similar from April to October. With the exception of the monsoon season, when rains are affecting on aerosol concentrations, clear but practically opposite diurnal cycles are observed. When the lowest daily aerosol concentrations are seen during afternoon hours in Gual Pahari, there is a peak in Mukteshwar aerosol concentrations. In addition to local sources and long-range transport of dust, boundary layer dynamics can explain the observed differences and similarities. When mixing of air masses is limited during the relatively cool winter months, aerosol pollutions are accumulated to the plains, but Mukteshwar is above the pollution layer. When mixing increases in the spring, aerosol concentrations are increased in Mukteshwar and decreased in Gual Pahari. The effect of mixing is also clear in the diurnal concentration cycles. When daytime mixing decreases aerosol concentrations in Gual Pahari, those are increased in Mukteshwar.

Download Full-text

Phylogenetic analysis of large molecular data sets

Botanical Sciences ◽

10.17129/botsci.1509 ◽

2017 ◽

pp. 99

Author(s):

Pamela S. Soltis ◽

Douglas E. Soltis

Keyword(s):

Large Data ◽

Molecular Data ◽

Global Optimum ◽

Large Data Sets ◽

Phylogeny Reconstruction ◽

Local Optimum ◽

Data Sets ◽

Additional Consideration ◽

Technological Advances ◽

The Difference

Technological advances in molecular biology have greatly increased the speed and efficiency of DNA sequencing, making it possible to construct large molecular data sets for phylogeny reconstruction relatively quickly. Despite their potential for improving our understanding of phylogeny, these large data sets also provide many challenges. In this paper, we discuss several of these challenges, including 1) the failure of a search to find the most parsimonious trees (the local optimum) in a reasonable amount of time, 2) the difference between a local optimum and the global optimum, and 3) the existence of multiple classes (islands) of most parsimonious trees. We also discuss possible strategies to improve the' likelihood of finding the most parsimonious tree(s) and present two examples from our work on angiosperm phylogeny. We conclude with a discussion of two alternatives to analyses of entire large data sets, the exemplar approach and compartmentalization, and suggest that additional consideration must be given to issues of data analysis for large data sets, whether morphological or molecular.

Download Full-text

A Procedure for the Automatic Determination of Filter Cutoff Frequency for the Processing of Biomechanical Data

Journal of Applied Biomechanics ◽

10.1123/jab.15.3.303 ◽

1999 ◽

Vol 15 (3) ◽

pp. 303-317 ◽

Cited By ~ 42

Author(s):

John H. Challis

Keyword(s):

White Noise ◽

Autocorrelation Function ◽

Cutoff Frequency ◽

Human Movement ◽

Data Sets ◽

Mathematical Functions ◽

The Best Approximation ◽

Second Derivatives ◽

Low Pass ◽

The Difference

This article presents and evaluates a new procedure that automatically determines the cutoff frequency for the low-pass filtering of biomechanical data. The cutoff frequency was estimated by exploiting the properties of the autocorrelation function of white noise. The new procedure systematically varies the cutoff frequency of a Butterworth filter until the signal representing the difference between the filtered and unfiltered data is the best approximation to white noise as assessed using the autocorrelation function. The procedure was evaluated using signals generated from mathematical functions. Noise was added to these signals so mat they approximated signals arising from me analysis of human movement. The optimal cutoff frequency was computed by finding the cutoff frequency that gave me smallest difference between the estimated and true signal values. The new procedure produced similar cutoff frequencies and root mean square differences to me optimal values, for me zeroth, first and second derivatives of the signals. On the data sets investigated, this new procedure performed very similarly to the generalized cross-validated quintic spline.

Download Full-text