scholarly journals Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5335 ◽  
Author(s):  
Wei Fang ◽  
Dongxu Wei ◽  
Ran Zhang

The rapid development of sensor technology gives rise to the emergence of huge amounts of tensor (i.e., multi-dimensional array) data. For various reasons such as sensor failures and communication loss, the tensor data may be corrupted by not only small noises but also gross corruptions. This paper studies the Stable Tensor Principal Component Pursuit (STPCP) which aims to recover a tensor from its corrupted observations. Specifically, we propose a STPCP model based on the recently proposed tubal nuclear norm (TNN) which has shown superior performance in comparison with other tensor nuclear norms. Theoretically, we rigorously prove that under tensor incoherence conditions, the underlying tensor and the sparse corruption tensor can be stably recovered. Algorithmically, we first develop an ADMM algorithm and then accelerate it by designing a new algorithm based on orthogonal tensor factorization. The superiority and efficiency of the proposed algorithms is demonstrated through experiments on both synthetic and real data sets.

Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 62
Author(s):  
Zhengwei Liu ◽  
Fukang Zhu

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.


2014 ◽  
Vol 926-930 ◽  
pp. 3629-3632
Author(s):  
Ya Nan Li ◽  
Zhuang Zhi Han

. Greater demands are imposed on the radar anti-jamming ability by the modern warfare. How to objectively test this ability has been an important problem demanding p romp t solution by both the radar developer and the user. This paper presents a jamming evaluation system based on improved grey correlation method. First it calculates the correlation of each index of the system by using the Delphi method for processing the expert scoring, which is taken as a result of the importance of each index score, then it associates degree and importance of each index, and filter out the initial indicators; Finally, through the analysis of the correlation between the primary indicators, an effective indicator system is given. By the real data sets, the experimental results show that the improved grey correlation analysis method is superior to principal component analysis.


2016 ◽  
Author(s):  
Suleiman A. Khan ◽  
Muhammad Ammad-ud-din

AbstractWith recent advancements in measurement technologies, many multi-way and tensor datasets have started to emerge. Exploiting the natural tensor structure in the data has been shown to be advantageous for both explorative and predictive studies in several application areas of bioinformatics and computational biology. Therefore, there has subsequently arisen a need for robust and flexible tools for effectively analyzing tensor data sets. We present the R package tensorBF, which is the first R package providing Bayesian factorization of a tensor. Our package implements a generative model that automatically identifies the number of factors needed to explain the tensor, overcoming a key limitation of traditional tensor factorizations. We also recommend best practices when using tensor factorizations for both, explorative and predictive analysis with an example application on drug response dataset. The package also implements tools related to the normalization of data, informative noise priors and visualization. Availability: The package is available at https://cran.r-project.org/package=tensorBF.


2021 ◽  
Vol 13 (9) ◽  
pp. 220
Author(s):  
Xujian Zhao ◽  
Wei Li

Owing to rapid development of the Internet and the rise of the big data era, microblog has become the main means for people to spread and obtain information. If people can accurately predict the development trend of a microblog event, it will be of great significance for the government to carry out public relations activities on network event supervision and guide the development of microblog event reasonably for network crisis. This paper presents effective solutions to deal with trend prediction of microblog events’ popularity. Firstly, by selecting the influence factors and quantifying the weight of each factor with an information entropy algorithm, the microblog event popularity is modeled. Secondly, the singular spectrum analysis is carried out to decompose and reconstruct the time series of the popularity of microblog event. Then, the box chart method is used to divide the popularity of microblog event into various trend spaces. In addition, this paper exploits the Bi-LSTM model to deal with trend prediction with a sequence to label model. Finally, the comparative experimental analysis is carried out on two real data sets crawled from Sina Weibo platform. Compared to three comparative methods, the experimental results show that our proposal improves F1-score by up to 39%.


2006 ◽  
Vol 63 (3) ◽  
pp. 576-596 ◽  
Author(s):  
Jerome Pella ◽  
Michele Masuda

Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and split–merge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the Hardy–Weinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative measure of the evidence supporting their various separate and common sources. The methodology is applied to several simulated and real data sets to illustrate its use and demonstrate the sampler's superior performance.


2007 ◽  
Vol 19 (2) ◽  
pp. 513-545 ◽  
Author(s):  
Inge Koch ◽  
Kanta Naito

This letter is concerned with the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. The dimension of the data is reduced by principal component analysis; subsequent application of independent component analysis to the principal component scores determines the most nongaussian directions in the lower-dimensional space. A criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis is proposed. This new dimension selector is applied to real data sets and compared to existing methods. Simulation studies for a range of densities show that the proposed method performs well and is more appropriate for nongaussian data than existing methods.


2016 ◽  
Vol 16 (6) ◽  
pp. 146-159 ◽  
Author(s):  
Zhijun Zhang ◽  
Huali Pan ◽  
Gongwen Xu ◽  
Yongkang Wang ◽  
Pengfei Zhang

Abstract With the rapid development of social networks, location based social network gradually rises. In order to retrieve user’s most preferred attractions from a large number of tourism information, personalized recommendation algorithm based on the geographic location has been widely concerned in academic and industry. Aiming at the problem of low accuracy in personalized tourism recommendation system, this paper presents a personalized algorithm for tourist attraction recommendation – RecUFG Algorithm, which combines user collaborative filtering technology with friends trust relationships and geographic context. This algorithm fully exploits social relations and trust friendship between users, and by means of the geographic information between user and attraction location, recommends users most interesting attractions. Experimental results on real data sets demonstrate the feasibility and effectiveness of the algorithm. Compared with the existing recommendation algorithm, it has a higher prediction accuracy and customer satisfaction.


Author(s):  
Li Chen ◽  
Lala Aicha Coulibaly

Data science and big data analytics are still at the center of computer science and information technology. Students and researchers not in computer science often found difficulties in real data analytics using programming languages such as Python and Scala, especially when they attempt to use Apache-Spark in cloud computing environments-Spark Scala and PySpark. At the same time, students in information technology could find it difficult to deal with the mathematical background of data science algorithms. To overcome these difficulties, this chapter will provide a practical guideline to different users in this area. The authors cover the main algorithms for data science and machine learning including principal component analysis (PCA), support vector machine (SVM), k-means, k-nearest neighbors (kNN), regression, neural networks, and decision trees. A brief description of these algorithms will be explained, and the related code will be selected to fit simple data sets and real data sets. Some visualization methods including 2D and 3D displays will be also presented in this chapter.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Shahin Mohammadi ◽  
Jose Davila-Velderrain ◽  
Manolis Kellis

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.


Genetics ◽  
2002 ◽  
Vol 162 (1) ◽  
pp. 473-484
Author(s):  
K Moazami-Goudarzi ◽  
D Laloë

Abstract To determine the relationships among closely related populations or species, two methods are commonly used in the literature: phylogenetic reconstruction or multivariate analysis. The aim of this article is to assess the reliability of multivariate analysis. We describe a method that is based on principal component analysis and Mantel correlations, using a two-step process: The first step consists of a single-marker analysis and the second step tests if each marker reveals the same typology concerning population differentiation. We conclude that if single markers are not congruent, the compromise structure is not meaningful. Our model is not based on any particular mutation process and it can be applied to most of the commonly used genetic markers. This method is also useful to determine the contribution of each marker to the typology of populations. We test whether our method is efficient with two real data sets based on microsatellite markers. Our analysis suggests that for closely related populations, it is not always possible to accept the hypothesis that an increase in the number of markers will increase the reliability of the typology analysis.


Sign in / Sign up

Export Citation Format

Share Document