Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

Wei Fang; Dongxu Wei; Ran Zhang

doi:10.3390/s19235335

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

Sensors ◽

10.3390/s19235335 ◽

2019 ◽

Vol 19 (23) ◽

pp. 5335 ◽

Cited By ~ 1

Author(s):

Wei Fang ◽

Dongxu Wei ◽

Ran Zhang

Keyword(s):

Error Bounds ◽

Rapid Development ◽

Principal Component ◽

Real Data ◽

Superior Performance ◽

Sensor Technology ◽

Data Sets ◽

Tensor Factorization ◽

Principal Component Pursuit ◽

Tensor Data

The rapid development of sensor technology gives rise to the emergence of huge amounts of tensor (i.e., multi-dimensional array) data. For various reasons such as sensor failures and communication loss, the tensor data may be corrupted by not only small noises but also gross corruptions. This paper studies the Stable Tensor Principal Component Pursuit (STPCP) which aims to recover a tensor from its corrupted observations. Specifically, we propose a STPCP model based on the recently proposed tubal nuclear norm (TNN) which has shown superior performance in comparison with other tensor nuclear norms. Theoretically, we rigorously prove that under tensor incoherence conditions, the underlying tensor and the sparse corruption tensor can be stably recovered. Algorithmically, we first develop an ADMM algorithm and then accelerate it by designing a new algorithm based on orthogonal tensor factorization. The superiority and efficiency of the proposed algorithms is demonstrated through experiments on both synthetic and real data sets.

Download Full-text

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Entropy ◽

10.3390/e23010062 ◽

2020 ◽

Vol 23 (1) ◽

pp. 62

Author(s):

Zhengwei Liu ◽

Fukang Zhu

Keyword(s):

Likelihood Estimation ◽

Real Data ◽

Autoregressive Models ◽

Superior Performance ◽

Data Sets ◽

Binomial Thinning ◽

Free Case ◽

Two Parameters ◽

Conditional Maximum ◽

Thinning Operator

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.

Download Full-text

Index and Evaluation of Radar Anti-Jamming Ability Based on Improved Grey Correlation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.3629 ◽

2014 ◽

Vol 926-930 ◽

pp. 3629-3632

Author(s):

Ya Nan Li ◽

Zhuang Zhi Han

Keyword(s):

Evaluation System ◽

Correlation Method ◽

Principal Component ◽

Real Data ◽

Indicator System ◽

Data Sets ◽

Grey Correlation ◽

Grey Correlation Analysis ◽

Correlation Analysis Method ◽

Associates Degree

. Greater demands are imposed on the radar anti-jamming ability by the modern warfare. How to objectively test this ability has been an important problem demanding p romp t solution by both the radar developer and the user. This paper presents a jamming evaluation system based on improved grey correlation method. First it calculates the correlation of each index of the system by using the Delphi method for processing the expert scoring, which is taken as a result of the importance of each index score, then it associates degree and importance of each index, and filter out the initial indicators; Finally, through the analysis of the correlation between the primary indicators, an effective indicator system is given. By the real data sets, the experimental results show that the improved grey correlation analysis method is superior to principal component analysis.

Download Full-text

tensorBF: an R package for Bayesian tensor factorization

10.1101/097048 ◽

2016 ◽

Cited By ~ 1

Author(s):

Suleiman A. Khan ◽

Muhammad Ammad-ud-din

Keyword(s):

Best Practices ◽

Drug Response ◽

R Package ◽

Predictive Analysis ◽

Data Sets ◽

Tensor Factorization ◽

Tensor Structure ◽

Number Of Factors ◽

Tensor Factorizations ◽

Tensor Data

AbstractWith recent advancements in measurement technologies, many multi-way and tensor datasets have started to emerge. Exploiting the natural tensor structure in the data has been shown to be advantageous for both explorative and predictive studies in several application areas of bioinformatics and computational biology. Therefore, there has subsequently arisen a need for robust and flexible tools for effectively analyzing tensor data sets. We present the R package tensorBF, which is the first R package providing Bayesian factorization of a tensor. Our package implements a generative model that automatically identifies the number of factors needed to explain the tensor, overcoming a key limitation of traditional tensor factorizations. We also recommend best practices when using tensor factorizations for both, explorative and predictive analysis with an example application on drug response dataset. The package also implements tools related to the normalization of data, informative noise priors and visualization. Availability: The package is available at https://cran.r-project.org/package=tensorBF.

Download Full-text

Trend Prediction of Event Popularity from Microblogs

Future Internet ◽

10.3390/fi13090220 ◽

2021 ◽

Vol 13 (9) ◽

pp. 220

Author(s):

Xujian Zhao ◽

Wei Li

Keyword(s):

Public Relations ◽

Rapid Development ◽

Singular Spectrum Analysis ◽

Influence Factors ◽

Real Data ◽

Development Trend ◽

Data Sets ◽

Trend Prediction ◽

The Government ◽

The Development Trend

Owing to rapid development of the Internet and the rise of the big data era, microblog has become the main means for people to spread and obtain information. If people can accurately predict the development trend of a microblog event, it will be of great significance for the government to carry out public relations activities on network event supervision and guide the development of microblog event reasonably for network crisis. This paper presents effective solutions to deal with trend prediction of microblog events’ popularity. Firstly, by selecting the influence factors and quantifying the weight of each factor with an information entropy algorithm, the microblog event popularity is modeled. Secondly, the singular spectrum analysis is carried out to decompose and reconstruct the time series of the popularity of microblog event. Then, the box chart method is used to divide the popularity of microblog event into various trend spaces. In addition, this paper exploits the Bi-LSTM model to deal with trend prediction with a sequence to label model. Finally, the comparative experimental analysis is carried out on two real data sets crawled from Sina Weibo platform. Compared to three comparative methods, the experimental results show that our proposal improves F1-score by up to 39%.

Download Full-text

The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f05-224 ◽

2006 ◽

Vol 63 (3) ◽

pp. 576-596 ◽

Cited By ~ 65

Author(s):

Jerome Pella ◽

Michele Masuda

Keyword(s):

Monte Carlo ◽

Binary Tree ◽

Quantitative Measure ◽

Real Data ◽

Genetic Data ◽

Superior Performance ◽

Data Sets ◽

Linkage Equilibrium ◽

Equilibrium Conditions ◽

Mixture Sample

Although population mixtures often include contributions from novel populations as well as from baseline populations previously sampled, unlabeled mixture individuals can be separated to their sources from genetic data. A Gibbs and splitmerge Markov chain Monte Carlo sampler is described for successively partitioning a genetic mixture sample into plausible subsets of individuals from each of the baseline and extra-baseline populations present. The subsets are selected to satisfy the HardyWeinberg and linkage equilibrium conditions expected for large, panmictic populations. The number of populations present can be inferred from the distribution for counts of subsets per partition drawn by the sampler. To further summarize the sampler's output, co-assignment probabilities of mixture individuals to the same subsets are computed from the partitions and are used to construct a binary tree of their relatedness. The tree graphically displays the clusters of mixture individuals together with a quantitative measure of the evidence supporting their various separate and common sources. The methodology is applied to several simulated and real data sets to illustrate its use and demonstrate the sampler's superior performance.

Download Full-text

Dimension Selection for Feature Selection and Dimension Reduction with Principal and Independent Component Analysis

Neural Computation ◽

10.1162/neco.2007.19.2.513 ◽

2007 ◽

Vol 19 (2) ◽

pp. 513-545 ◽

Cited By ~ 12

Author(s):

Inge Koch ◽

Kanta Naito

Keyword(s):

Independent Component Analysis ◽

Dimension Reduction ◽

Dimensional Space ◽

Principal Component ◽

Real Data ◽

Component Analysis ◽

Independent Component ◽

Data Sets ◽

Optimal Dimension ◽

Lower Dimensional Space

This letter is concerned with the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. The dimension of the data is reduced by principal component analysis; subsequent application of independent component analysis to the principal component scores determines the most nongaussian directions in the lower-dimensional space. A criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis is proposed. This new dimension selector is applied to real data sets and compared to existing methods. Simulation studies for a range of densities show that the proposed method performs well and is more appropriate for nongaussian data than existing methods.

Download Full-text

A Context-Awareness Personalized Tourist Attraction Recommendation Algorithm

Cybernetics and Information Technologies ◽

10.1515/cait-2016-0084 ◽

2016 ◽

Vol 16 (6) ◽

pp. 146-159 ◽

Cited By ~ 3

Author(s):

Zhijun Zhang ◽

Huali Pan ◽

Gongwen Xu ◽

Yongkang Wang ◽

Pengfei Zhang

Keyword(s):

Social Relations ◽

Recommendation System ◽

Rapid Development ◽

Geographic Location ◽

Real Data ◽

Personalized Recommendation ◽

Data Sets ◽

Tourist Attraction ◽

Recommendation Algorithm ◽

Tourism Information

Abstract With the rapid development of social networks, location based social network gradually rises. In order to retrieve user’s most preferred attractions from a large number of tourism information, personalized recommendation algorithm based on the geographic location has been widely concerned in academic and industry. Aiming at the problem of low accuracy in personalized tourism recommendation system, this paper presents a personalized algorithm for tourist attraction recommendation – RecUFG Algorithm, which combines user collaborative filtering technology with friends trust relationships and geographic context. This algorithm fully exploits social relations and trust friendship between users, and by means of the geographic information between user and attraction location, recommends users most interesting attractions. Experimental results on real data sets demonstrate the feasibility and effectiveness of the algorithm. Compared with the existing recommendation algorithm, it has a higher prediction accuracy and customer satisfaction.

Download Full-text

Data Science and Big Data Practice Using Apache Spark and Python

Advances in Data Mining and Database Management - Intelligent Analytics With Advanced Multi-Industry Applications ◽

10.4018/978-1-7998-4963-6.ch004 ◽

2021 ◽

pp. 67-95

Author(s):

Li Chen ◽

Lala Aicha Coulibaly

Keyword(s):

Information Technology ◽

Big Data ◽

Computer Science ◽

Data Analytics ◽

Data Science ◽

Principal Component ◽

Real Data ◽

Apache Spark ◽

Data Sets ◽

Information Technology Students

Data science and big data analytics are still at the center of computer science and information technology. Students and researchers not in computer science often found difficulties in real data analytics using programming languages such as Python and Scala, especially when they attempt to use Apache-Spark in cloud computing environments-Spark Scala and PySpark. At the same time, students in information technology could find it difficult to deal with the mathematical background of data science algorithms. To overcome these difficulties, this chapter will provide a practical guideline to different users in this area. The authors cover the main algorithms for data science and machine learning including principal component analysis (PCA), support vector machine (SVM), k-means, k-nearest neighbors (kNN), regression, neural networks, and decision trees. A brief description of these algorithms will be explained, and the related code will be selected to fit simple data sets and real data sets. Some visualization methods including 2D and 3D displays will be also presented in this chapter.

Download Full-text

A multiresolution framework to characterize single-cell state landscapes

Nature Communications ◽

10.1038/s41467-020-18416-6 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Shahin Mohammadi ◽

Jose Davila-Velderrain ◽

Manolis Kellis

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Real Data ◽

Cell Types ◽

Cellular Heterogeneity ◽

Superior Performance ◽

Data Sets ◽

Structural Representation ◽

Archetypal Analysis ◽

Cell State

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.

Download Full-text

Is a Multivariate Consensus Representation of Genetic Relationships Among Populations Always Meaningful?

Genetics ◽

10.1093/genetics/162.1.473 ◽

2002 ◽

Vol 162 (1) ◽

pp. 473-484

Author(s):

K Moazami-Goudarzi ◽

D Laloë

Keyword(s):

Multivariate Analysis ◽

Genetic Relationships ◽

Phylogenetic Reconstruction ◽

Principal Component ◽

Real Data ◽

Data Sets ◽

Single Marker Analysis ◽

Single Marker ◽

Mantel Correlations ◽

Typology Analysis

Abstract To determine the relationships among closely related populations or species, two methods are commonly used in the literature: phylogenetic reconstruction or multivariate analysis. The aim of this article is to assess the reliability of multivariate analysis. We describe a method that is based on principal component analysis and Mantel correlations, using a two-step process: The first step consists of a single-marker analysis and the second step tests if each marker reveals the same typology concerning population differentiation. We conclude that if single markers are not congruent, the compromise structure is not meaningful. Our model is not based on any particular mutation process and it can be applied to most of the commonly used genetic markers. This method is also useful to determine the contribution of each marker to the typology of populations. We test whether our method is efficient with two real data sets based on microsatellite markers. Our analysis suggests that for closely related populations, it is not always possible to accept the hypothesis that an increase in the number of markers will increase the reliability of the typology analysis.

Download Full-text

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Index and Evaluation of Radar Anti-Jamming Ability Based on Improved Grey Correlation

tensorBF: an R package for Bayesian tensor factorization

Trend Prediction of Event Popularity from Microblogs

The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines

Dimension Selection for Feature Selection and Dimension Reduction with Principal and Independent Component Analysis

A Context-Awareness Personalized Tourist Attraction Recommendation Algorithm

Data Science and Big Data Practice Using Apache Spark and Python

A multiresolution framework to characterize single-cell state landscapes

Is a Multivariate Consensus Representation of Genetic Relationships Among Populations Always Meaningful?

The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines