scholarly journals Clique Selection and its Effect on Paraclique Enrichment: An Experimental Study

10.29007/3sdd ◽  
2020 ◽  
Author(s):  
Yuping Lu ◽  
Charles Phillips ◽  
Elissa Chesler ◽  
Michael Langston

The paraclique algorithm provides an effective means for biological data clustering. It satisfies the mathematical quest for density, while fulfilling the pragmatic need for noise abatement on real data. Given a finite, simple, edge-weighted and thresholded graph, the paraclique method first finds a maximum clique, then incorporates additional vertices in a controlled manner, and finally extracts the subgraph thereby defined. When more than one maximum clique is present, however, deciding which to employ is usually left unspecified. In practice, this frequently and quite naturally reduces to using the first maximum clique found. In this paper, maximum clique selection is studied in the context of well-annotated transcriptomic data, with ontological classification used as a proxy for cluster quality. Enrichment p-values are compared using maximum cliques chosen in a variety of ways. The most appealing and intuitive option is almost surely to start with the maximum clique having the highest average edge weight. Although there is of course no guarantee that such a strategy is any better than random choice, results derived from a large collection of experiments indicate that, in general, this approach produces a small but statistically significant improvement in overall cluster quality. Such an improvement, though modest, may be well worth pursuing in light of the time, expense and expertise often required to generate timely, high quality, high throughput biological data.

2018 ◽  
Vol 24 (3) ◽  
pp. 629-669 ◽  
Author(s):  
Fuat C. Beylunioğlu ◽  
M. Ege Yazgan ◽  
Thanasis Stengos

The convergence hypothesis, which is developed in the context of growth economics, asserts that the income differences across countries are transitory, and developing countries will eventually attain the level of income of developed ones. On the other hand, convergence clubs hypothesis claim that the convergence can only be realized across groups of countries that share some common characteristics. In this study, we propose a new method to find convergence clubs that combines a pairwise method of testing convergence with maximum clique and maximal clique algorithms. Unlike many of those already developed in the literature, this new method aims to find convergence clubs endogenously without depending on a-priori classifications. In a Monte Carlo simulation study, the success of the method in finding convergence clubs is compared with a similar algorithm. Simulation results indicated that the proposed method perform better than the compared algorithm in most cases. In addition to the Monte Carlo, a new empirical evidence on the existence of convergence clubs is presented in the context of real data applications.


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4618
Author(s):  
Francisco Oliveira ◽  
Miguel Luís ◽  
Susana Sargento

Unmanned Aerial Vehicle (UAV) networks are an emerging technology, useful not only for the military, but also for public and civil purposes. Their versatility provides advantages in situations where an existing network cannot support all requirements of its users, either because of an exceptionally big number of users, or because of the failure of one or more ground base stations. Networks of UAVs can reinforce these cellular networks where needed, redirecting the traffic to available ground stations. Using machine learning algorithms to predict overloaded traffic areas, we propose a UAV positioning algorithm responsible for determining suitable positions for the UAVs, with the objective of a more balanced redistribution of traffic, to avoid saturated base stations and decrease the number of users without a connection. The tests performed with real data of user connections through base stations show that, in less restrictive network conditions, the algorithm to dynamically place the UAVs performs significantly better than in more restrictive conditions, reducing significantly the number of users without a connection. We also conclude that the accuracy of the prediction is a very important factor, not only in the reduction of users without a connection, but also on the number of UAVs deployed.


Symmetry ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 936
Author(s):  
Dan Wang

In this paper, a ratio test based on bootstrap approximation is proposed to detect the persistence change in heavy-tailed observations. This paper focuses on the symmetry testing problems of I(1)-to-I(0) and I(0)-to-I(1). On the basis of residual CUSUM, the test statistic is constructed in a ratio form. I prove the null distribution of the test statistic. The consistency under alternative hypothesis is also discussed. However, the null distribution of the test statistic contains an unknown tail index. To address this challenge, I present a bootstrap approximation method for determining the rejection region of this test. Simulation studies of artificial data are conducted to assess the finite sample performance, which shows that our method is better than the kernel method in all listed cases. The analysis of real data also demonstrates the excellent performance of this method.


2011 ◽  
Vol 143-144 ◽  
pp. 770-774 ◽  
Author(s):  
Shou Lei Lu ◽  
Long Zhao ◽  
Chang Yun Zhang

In order to solve the problem of the traditional Tercom, which is sensitive to the speed error and yaw angle error, an improved Tercom approach using with fading factor is introduced. The basic idea of this approach is to estimate the navigation position by a novel correlation function. The correlation function is calculated by weighted historical measurements. Experiment results with the real data show that this approach performs better than the traditional Tercom with regard to overcoming velocity error and yaw angle error.


F1000Research ◽  
2020 ◽  
Vol 8 ◽  
pp. 2024
Author(s):  
Joshua P. Zitovsky ◽  
Michael I. Love

Allelic imbalance occurs when the two alleles of a gene are differentially expressed within a diploid organism and can indicate important differences in cis-regulation and epigenetic state across the two chromosomes. Because of this, the ability to accurately quantify the proportion at which each allele of a gene is expressed is of great interest to researchers. This becomes challenging in the presence of small read counts and/or sample sizes, which can cause estimators for allelic expression proportions to have high variance. Investigators have traditionally dealt with this problem by filtering out genes with small counts and samples. However, this may inadvertently remove important genes that have truly large allelic imbalances. Another option is to use pseudocounts or Bayesian estimators to reduce the variance. To this end, we evaluated the accuracy of four different estimators, the latter two of which are Bayesian shrinkage estimators: maximum likelihood, adding a pseudocount to each allele, approximate posterior estimation of GLM coefficients (apeglm) and adaptive shrinkage (ash). We also wrote C++ code to quickly calculate ML and apeglm estimates and integrated it into the apeglm package. The four methods were evaluated on two simulations and one real data set. Apeglm consistently performed better than ML according to a variety of criteria, and generally outperformed use of pseudocounts as well. Ash also performed better than ML in one of the simulations, but in the other performance was more mixed. Finally, when compared to five other packages that also fit beta-binomial models, the apeglm package was substantially faster and more numerically reliable, making our package useful for quick and reliable analyses of allelic imbalance. Apeglm is available as an R/Bioconductor package at http://bioconductor.org/packages/apeglm.


Author(s):  
V. P. Yahodzinskyi ◽  
O. M. Kisilyuk ◽  
D. P. Polozenko ◽  
O. A. Desyatka ◽  
V. O. Novytskyi ◽  
...  

Power sports have always been very popular among servicemen. This is especially evident in recent times, when such power sports, as armsport, crossfit, kettlebell lifting, powerlifting have received some development. The article presents a comparative analysis of the power capabilities of cadets who specialize in various power sports. The study involved male cadets of the 4th year of various faculties (n=53) of the Military Academy (Odessa), who in the process of training were engaged in power sports: armsport (n=12), crossfit (n=16), kettlebell lifting (n=15), powerlifting (n = 10). In addition, the results of cadets-athletes were compared with the results of cadets of the faculty of training specialists of airborne assault troops and military intelligence and special purpose, who did not attend sports sections of the academy (n=60). Analysis of the power of the cadets was carried out on the following exercises: pull-up, lifting by coup, lifting by strength on the crossbar, 24 kg kettlebell snatch, push-up on the parallel bars, push-up. The conducted researches testified to the high level of development of power qualities of cadets of all sections of academy in power sports. There was no significant difference between the results of cadets of different power sports in most exercises. It is established that the power capabilities of cadets who are engaged in various power sports are significantly better than those of cadets who are not engaged in sports. This suggests that any of the studied power sports can be an effective means of developing and improving the power opportunities of future officers of the Armed Forces of Ukraine


2016 ◽  
Vol 15 (2) ◽  
pp. ar17 ◽  
Author(s):  
Patricia Zagallo ◽  
Shanice Meddleton ◽  
Molly S. Bolger

We present our design for a cell biology course to integrate content with scientific practices, specifically data interpretation and model-based reasoning. A 2-yr research project within this course allowed us to understand how students interpret authentic biological data in this setting. Through analysis of written work, we measured the extent to which students’ data interpretations were valid and/or generative. By analyzing small-group audio recordings during in-class activities, we demonstrated how students used instructor-provided models to build and refine data interpretations. Often, students used models to broaden the scope of data interpretations, tying conclusions to a biological significance. Coding analysis revealed several strategies and challenges that were common among students in this collaborative setting. Spontaneous argumentation was present in 82% of transcripts, suggesting that data interpretation using models may be a way to elicit this important disciplinary practice. Argumentation dialogue included frequent co-construction of claims backed by evidence from data. Other common strategies included collaborative decoding of data representations and noticing data patterns before making interpretive claims. Focusing on irrelevant data patterns was the most common challenge. Our findings provide evidence to support the feasibility of supporting students’ data-interpretation skills within a large lecture course.


1984 ◽  
Vol 38 (5) ◽  
pp. 663-668 ◽  
Author(s):  
Lesia L. Tyson ◽  
Yong-Chien Ling ◽  
Charles K. Mann

Two data-handling techniques, least-squares fitting and cross-correlation, have been used for three-component analysis under comparable conditions with the use of both simulated and real data Factors considered are the effect of variation in degree of peak overlap, signal-to-noise ratio, the effect of peak width variations when peak maxima occur at the same position, and the effect of varying peak intensities A series of lipid mixtures was analyzed by each method with the use of infrared absorption This permits comparison of these results with earlier reports Both least-squares and cross-correlation can be used with samples that are outside the applicable range of the earlier work In this comparison, the least-squares results are somewhat better than those from cross-correlation


2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Ahmed Ibrahim Shawky ◽  
Muhammad Aslam ◽  
Khushnoor Khan

In this paper, a control chart scheme has been introduced for the mean monitoring using gamma distribution for belief statistics using multiple dependent (deferred) state sampling under the neutrosophic statistics. The coefficients of the control chart and the neutrosophic average run lengths have been estimated for specific false alarm probabilities under various process conditions. The offered chart has been compared with the existing classical chart through simulation and the real data. From the comparison, it is concluded that the performance of the proposed chart is better than that of the existing chart in terms of average run length under uncertain environment. The proposed chart has the ability to detect a shift quickly than the existing chart. It has been observed that the proposed chart is efficient in quick monitoring of the out-of-control process and a cherished addition in the toolkit of the quality control personnel.


Sign in / Sign up

Export Citation Format

Share Document