Adaptive multiscale principal components analysis for online monitoring of wastewater treatment

2002 ◽  
Vol 45 (4-5) ◽  
pp. 227-235 ◽  
Author(s):  
J. Lennox ◽  
C. Rosen

Fault detection and isolation (FDI) are important steps in the monitoring and supervision of industrial processes. Biological wastewater treatment (WWT) plants are difficult to model, and hence to monitor, because of the complexity of the biological reactions and because plant influent and disturbances are highly variable and/or unmeasured. Multivariate statistical models have been developed for a wide variety of situations over the past few decades, proving successful in many applications. In this paper we develop a new monitoring algorithm based on Principal Components Analysis (PCA). It can be seen equivalently as making Multiscale PCA (MSPCA) adaptive, or as a multiscale decomposition of adaptive PCA. Adaptive Multiscale PCA (AdMSPCA) exploits the changing multivariate relationships between variables at different time-scales. Adaptation of scale PCA models over time permits them to follow the evolution of the process, inputs or disturbances. Performance of AdMSPCA and adaptive PCA on a real WWT data set is compared and contrasted. The most significant difference observed was the ability of AdMSPCA to adapt to a much wider range of changes. This was mainly due to the flexibility afforded by allowing each scale model to adapt whenever it did not signal an abnormal event at that scale. Relative detection speeds were examined only summarily, but seemed to depend on the characteristics of the faults/disturbances. The results of the algorithms were similar for sudden changes, but AdMSPCA appeared more sensitive to slower changes.

2006 ◽  
Vol 53 (4) ◽  
pp. 427-437 ◽  
Author(s):  
Mirko Savic

For the last decade, the employment structure is one of the fastest changing areas of Eastern Europe. This paper explores the best methodology to compare the employment situations in the countries of this region. Multivariate statistical analyses are very reliable in portraying the full picture of the problem. Principal components analysis is one of the simplest multivariate methods. It can produce very useful information about Eastern European employment in a very easy and understandable way.


2013 ◽  
Vol 17 (7) ◽  
pp. 1476-1485 ◽  
Author(s):  
Kate Northstone ◽  
Andrew DAC Smith ◽  
Victoria L Cribb ◽  
Pauline M Emmett

AbstractObjectiveTo derive dietary patterns using principal components analysis from separate FFQ completed by mothers and their teenagers and to assess associations with nutrient intakes and sociodemographic variables.DesignTwo distinct FFQ were completed by 13-year-olds and their mothers, with some overlap in the foods covered. A combined data set was obtained.SettingAvon Longitudinal Study of Parents and Children (ALSPAC), Bristol, UK.SubjectsTeenagers (n 5334) with adequate dietary data.ResultsFour patterns were obtained using principal components analysis: a ‘Traditional/health-conscious’ pattern, a ‘Processed’ pattern, a ‘Snacks/sugared drinks’ pattern and a ‘Vegetarian’ pattern. The ‘Traditional/health-conscious’ pattern was the most nutrient-rich, having high positive correlations with many nutrients. The ‘Processed’ and ‘Snacks/sugared drinks’ patterns showed little association with important nutrients but were positively associated with energy, fats and sugars. There were clear gender and sociodemographic differences across the patterns. Lower scores were seen on the ‘Traditional/health conscious’ and ‘Vegetarian’ patterns in males and in those with younger and less educated mothers. Higher scores were seen on the ‘Traditional/health-conscious’ and ‘Vegetarian’ patterns in girls and in those whose mothers had higher levels of education.ConclusionsIt is important to establish healthy eating patterns by the teenage years. However, this is a time when it is difficult to accurately establish dietary intake from a single source, since teenagers consume increasing amounts of foods outside the home. Further dietary pattern studies should focus on teenagers and the source of dietary data collection merits consideration.


1984 ◽  
Vol 18 (11) ◽  
pp. 2471-2478 ◽  
Author(s):  
J. Smeyers-Verbeke ◽  
J.C. Den Hartog ◽  
W.H. Dehker ◽  
D. Coomans ◽  
L. Buydens ◽  
...  

2006 ◽  
Vol 23 (3) ◽  
pp. 106-118 ◽  
Author(s):  
Gordon E. Sarty ◽  
Kinwah Wu

AbstractThe ratios of hydrogen Balmer emission line intensities in cataclysmic variables are signatures of the physical processes that produce them. To quantify those signatures relative to classifications of cataclysmic variable types, we applied the multivariate statistical analysis methods of principal components analysis and discriminant function analysis to the spectroscopic emission data set of Williams (1983). The two analysis methods reveal two different sources of variation in the ratios of the emission lines. The source of variation seen in the principal components analysis was shown to be correlated with the binary orbital period. The source of variation seen in the discriminant function analysis was shown to be correlated with the equivalent width of the Hβ line. Comparison of the data scatterplot with scatterplots of theoretical models shows that Balmer line emission from T CrB systems is consistent with the photoionization of a surrounding nebula. Otherwise, models that we considered do not reproduce the wide range of Balmer decrements, including ‘inverted’ decrements, seen in the data.


2019 ◽  
Author(s):  
Fred L. Bookstein

AbstractGood empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “highp/n,” wherepis the count of variables andnthe count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-p/nsetting. The more obvious pathology is this: when applied to the patternless (null) model ofpidentically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are actually fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-p/nsettings the bgPCA method very often leads to invalid or insecure bioscientific inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically — it is never authoritative — and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.


1983 ◽  
Vol 40 (10) ◽  
pp. 1752-1760 ◽  
Author(s):  
Michael A. Gates ◽  
Ann P. Zimmerman ◽  
W. Gary Sprules ◽  
Roy Knoechel

We introduce a method, based on principal components analysis, for studying temporal changes in biomass allocation among 16 size–category compartments of lake plankton. Applied to data from a series of 12 Ontario lakes over three sampling seasons, the technique provides a simple means of visualizing shifts in patterns of biomass allocation, and it allows comparative analyses of biomass fluctuations in different lakes. Each of the primary component axes is interpretable. Furthermore, a large proportion of the variance in both the mean position of a lake and its movement along these axes is interpreted as a function of lake physicochemistry. The analysis also provides weighted scores for use in hypothesis testing which are an improvement over mean biomass values alone, because they take into account the structure of variation in the data set.


2013 ◽  
Vol 7 (1) ◽  
pp. 19-24
Author(s):  
Kevin Blighe

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.


2020 ◽  
Vol 25 (1-2) ◽  
pp. 35-56
Author(s):  
Vasil Simeonov

Abstract The present introductory course of lectures summarizes the principles and algorithms of several widely used multivariate statistical methods: cluster analysis, principal components analysis, principal components regression, N-way principal components analysis, partial least squares regression and self-organizing maps with respect to their possible application in intelligent analysis, classification, modelling and interpretation to environmental monitoring data. The target group of possible users is master program students (environmental chemistry, analytical chemistry, environmental modelling and risk assessment etc.).


2019 ◽  
Vol 5 ◽  
pp. 237802311881872 ◽  
Author(s):  
Ryan Compton

Sociological research typically involves exploring theoretical relationships, but the emergence of “big data” enables alternative approaches. This work shows the promise of data-driven machine-learning techniques involving feature engineering and predictive model optimization to address a sociological data challenge. The author’s group develops improved generalizable models to identify at-risk families. Principal-components analysis and decision tree modeling are used to predict six main dependent variables in the Fragile Families Challenge, successfully modeling one binary variable but no continuous dependent variables in the diagnostic data set. This indicates that some binary dependent variables are more predictable using a reduced set of uncorrelated independent variables, and continuous dependent variables demand more complexity.


Sign in / Sign up

Export Citation Format

Share Document