scholarly journals iSeg: an efficient algorithm for segmentation of genomic and epigenomic data

2017 ◽  
Author(s):  
S.B. Girimurugan ◽  
Yuhang Liu ◽  
Pei-Yau Lung ◽  
Daniel L. Vera ◽  
Jonathan H. Dennis ◽  
...  

AbstractBackgroundIdentification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. This problem is often called the segmentation problem in the field of genomics, and the change-point problem in other scientific disciplines. Despite dozens of algorithms developed to address this problem in genomics research, methods with improved accuracy and speed are still needed to effectively tackle both existing and emerging genomic and epigenomic segmentation problems.ResultsWe designed an efficient algorithm, called iSeg, for segmentation of genomic and epigenomic profiles. iSeg first utilizes dynamic programming to identify candidate segments and test for significance. It then uses a novel data structure based on two coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Refinement and merging of significant segments are performed at the end to generate the final set of segments. By using an objective function based on the p-values of the segments, the algorithm can serve as a general computational framework to be combined with different assumptions on the distributions of the data. As a general segmentation method, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, nuclease sensitivity, and differential nuclease sensitivity data. Using simple t-tests to compute p-values across multiple datasets of different types, we evaluate iSeg using both simulated and experimental datasets and show that it performs satisfactorily when compared with some other popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is also very computationally efficient, well suited for large numbers of input profiles and data with very long sequences.ConclusionsWe have developed an effective and efficient general-purpose segmentation tool for sequential data and illustrated its use in segmentation of genomic and epigenomic profiles.

Chicken meat are being widely consumed as they contain high protein and a healthier unsaturated fat type. Chicken burger represent a consumer palatable chicken product. Both chicken and its products are liable to different types of contamination during their preparation and processing. Contamination by S. aureus and its enterotoxins poses a major public health hazard to chicken meat consumes. During this study 100 different samples of chicken fillet, deboned thigh, wing, mechanically deboned meat (MDM) and chicken burger (20 each) was collected from market and investigated for their S. aureus count and ability of the isolated strains to produce enterotoxins using conventional plating and isolation technique as well as using SET-RPLA toxin detection kit. Results revealed that mean values of S. aureus count in all samples exceeded the permissible limits and hence being unacceptable. MDM isolated exhibited staphylococcal enterotoxins (SEs) production of three different types SEA, SEC and SED. Meanwhile chicken burger S. aureus isolates produced only SEA and SEC enterotoxins. While isolated S. aureus from chicken fillet and deboned thigh didn’t exhibit any enterotoxin production activity. It’s recommended to follow the hygienic practices during different processing stages to avoid the risk of S. aureus and its enterotoxins.


AGROFOR ◽  
2018 ◽  
Vol 2 (2) ◽  
Author(s):  
Snežana JOVANOVIĆ ◽  
Goran TODOROVIĆ ◽  
Nikola GRČIĆ ◽  
Ratibor ŠTRBANOVIĆ ◽  
Rade STANISAVLJEVIĆ ◽  
...  

The aim of the present study was to determine effects of both, different types ofcytoplasm (cms-C, cms-S and fertile) and environmental factors on the kernel rownumber of 12 maize inbreds lines. The trial with inbred lines was set up in twolocations (Zemun Polje-Selection field and Zemun Polje-Školsko dobro) in 2013and 2014. Moreover, the three-replicate trials were set up according to therandomised complete block design within each type of cytoplasm. Each plot withinthe replicate consisted of four rows. Fertile versions of inbred lines were sown intwo border rows and they were pollinators for their sterile counterparts. Statisticbiometricdata processing was based on mean values per replicate and included theanalysis of variance. According to this analysis, significant differences in thekernel row number were established among inbred lines in dependence on the typeof cytoplasm, year and the location. The average kernel row number ranged from10.3 (L9) to 15.8 (L5 and L7). The variation of the kernel row number, related to thesource of cytoplasm, was very significant. Differences (Lsd0.01) in the kernel rownumber were not determined in inbred lines L5, L8, L10 and L12 in regard to the typeof cytoplasm: cms-C, cms-S and fertile. The average kernel row numbersignificantly (P1%) varied in regard to the year of investigation. A higher averagevalue (13.75) was established in 2014 than in 2013 (13.31). The kernel row numberper year very significantly varied (Lsd0.01) in all inbreds, but the differences werenot significant in the inbreds L2, L3, L8, L9 and L12. Gained results point out toeffects of different types of cytoplasm on the kernel row number.


2016 ◽  
Vol 59 (12) ◽  
pp. 2355-2378
Author(s):  
BaiSuo Jin ◽  
GuangMing Pan ◽  
Qing Yang ◽  
Wang Zhou

2021 ◽  
pp. 096228022110326
Author(s):  
Kristine Gierz ◽  
Kayoung Park ◽  
Peihua Qiu

In general, the change point problem considers inference of a change in distribution for a set of time-ordered observations. This has applications in a large variety of fields, and can also apply to survival data. In survival analysis, most existing methods compare two treatment groups for the entirety of the study period. Some treatments may take a length of time to show effects in subjects. This has been called the time-lag effect in the literature, and in cases where time-lag effect is considerable, such methods may not be appropriate to detect significant differences between two groups. In this paper, we propose a novel non-parametric approach for estimating the point of treatment time-lag effect by using an empirical divergence measure. Theoretical properties of the estimator are studied. The results from the simulated data and the applications to real data examples support our proposed method.


2005 ◽  
Vol 08 (04) ◽  
pp. 433-449 ◽  
Author(s):  
FERNANDO A. QUINTANA ◽  
PILAR L. IGLESIAS ◽  
HELENO BOLFARINE

The problem of outlier and change-point identification has received considerable attention in traditional linear regression models from both, classical and Bayesian standpoints. In contrast, for the case of regression models with measurement errors, also known as error-in-variables models, the corresponding literature is scarce and largely focused on classical solutions for the normal case. The main object of this paper is to propose clustering algorithms for outlier detection and change-point identification in scale mixture of error-in-variables models. We propose an approach based on product partition models (PPMs) which allows one to study clustering for the models under consideration. This includes the change-point problem and outlier detection as special cases. The outlier identification problem is approached by adapting the algorithms developed by Quintana and Iglesias [32] for simple linear regression models. A special algorithm is developed for the change-point problem which can be applied in a more general setup. The methods are illustrated with two applications: (i) outlier identification in a problem involving the relationship between two methods for measuring serum kanamycin in blood samples from babies, and (ii) change-point identification in the relationship between the monthly dollar volume of sales on the Boston Stock Exchange and the combined monthly dollar volumes for the New York and American Stock Exchanges.


2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Zhanchao Li ◽  
Chongshi Gu ◽  
Zhongru Wu

The study on diagnosis method of concrete crack behavior abnormality has always been a hot spot and difficulty in the safety monitoring field of hydraulic structure. Based on the performance of concrete dam crack behavior abnormality in parametric statistical model and nonparametric statistical model, the internal relation between concrete dam crack behavior abnormality and statistical change point theory is deeply analyzed from the model structure instability of parametric statistical model and change of sequence distribution law of nonparametric statistical model. On this basis, through the reduction of change point problem, the establishment of basic nonparametric change point model, and asymptotic analysis on test method of basic change point problem, the nonparametric change point diagnosis method of concrete dam crack behavior abnormality is created in consideration of the situation that in practice concrete dam crack behavior may have more abnormality points. And the nonparametric change point diagnosis method of concrete dam crack behavior abnormality is used in the actual project, demonstrating the effectiveness and scientific reasonableness of the method established. Meanwhile, the nonparametric change point diagnosis method of concrete dam crack behavior abnormality has a complete theoretical basis and strong practicality with a broad application prospect in actual project.


2018 ◽  
Vol 22 ◽  
pp. 210-235
Author(s):  
Victor-Emmanuel Brunel

We address the problem of detection and estimation of one or two change-points in the mean of a series of random variables. We use the formalism of set estimation in regression: to each point of a design is attached a binary label that indicates whether that point belongs to an unknown segment and this label is contaminated with noise. The endpoints of the unknown segment are the change-points. We study the minimal size of the segment which allows statistical detection in different scenarios, including when the endpoints are separated from the boundary of the domain of the design, or when they are separated from one another. We compare this minimal size with the minimax rates of convergence for estimation of the segment under the same scenarios. The aim of this extensive study of a simple yet fundamental version of the change-point problem is two-fold: understanding the impact of the location and the separation of the change points on detection and estimation and bringing insights about the estimation and detection of convex bodies in higher dimensions.


Sign in / Sign up

Export Citation Format

Share Document