scholarly journals Encoding Prior Knowledge with Eigenword Embeddings

2016 ◽  
Vol 4 ◽  
pp. 417-430 ◽  
Author(s):  
Dominique Osborne ◽  
Shashi Narayan ◽  
Shay B. Cohen

Canonical correlation analysis (CCA) is a method for reducing the dimension of data represented using two views. It has been previously used to derive word embeddings, where one view indicates a word, and the other view indicates its context. We describe a way to incorporate prior knowledge into CCA, give a theoretical justification for it, and test it by deriving word embeddings and evaluating them on a myriad of datasets.

Author(s):  
Sabrina Morilhas Simões ◽  
Antonio Leão Castilho ◽  
Adilson Fransozo ◽  
Maria Lúcia Negreiros-Fransozo ◽  
Rogerio Caetano da Costa

The abundance and ecological distribution of Acetes americanus and Peisos petrunkevitchi were investigated from July 2006 to June 2007, in Ubatuba, Brazil. Eight transects were identified and sampled monthly: six of these transects were located in Ubatuba bay, with depths reaching 21 m, and the other two transects were in estuarine environments. A total of 33,888 A. americanus shrimp were captured, with the majority coming from the shallower transects (up to 10 m). Conversely, 6,173 of the P. petrunkevitchi shrimps were captured in deeper areas (from 9 to 21 m). No individuals from either species were found in the estuary. The highest abundances obtained for both species were sampled during the summer. Canonical correlation analysis resulted in a coefficient value of 0.68 (P = 0.00). The abundance of both species was strongly correlated with depth. Variations in temperature and salinity values were also informative in predicting the seasonal presence of P. petrunkevitchi in deeper areas and A. americanus in the shallower areas of the bay. It is conceivable that the shrimp adjust their ecological distribution according to their intrinsic physiological limitations.


1984 ◽  
Vol 62 (11) ◽  
pp. 2317-2327 ◽  
Author(s):  
P. Legendre ◽  
D. Planas ◽  
M.-J. Auclair

This paper compares the succession of gastropods in two environments that are adjacent in space but differ as to their eutrophic level. One is hypereutrophic (du Sud River), the other is mesotrophic (Richelieu River). Canonical correlation analysis brings out the main differences between these two stations, while principal component analysis is used to describe the succession of species within each community. These analyses indicate that the occurrence of gastropod species, as well as their development cycles, may be adapted to the particular synecological evolution of each environment. Thus, the species would not react directly to nutrient concentrations but indirectly, through the effects of these concentrations on oxygen content, plant cover, and predators. In these two environments, some benthic species seem to be good indicators of the eutrophic level of the ecosystem.


2016 ◽  
Vol 13 (2) ◽  
Author(s):  
Glòria Mateu-Figueras ◽  
Josep Daunis-i-Estadella ◽  
Germà Coenders ◽  
Berta Ferrer-Rosell ◽  
Ricard Serlavós ◽  
...  

The aim of this article is to describe a method for relating two compositions which combines compositional data analysis and canonical correlation analysis (CCA), and to examine its main statistical properties. We use additive log-ratio (alr) transformation on both compositions and apply standard CCA to the transformed data. We show that canonical variates are themselves log-ratios and log-contrasts. The first pair of canonical variates can be interpreted as the log-contrast of a composition that has the maximum correlation with a log-contrast of the other composition. The second pair can be interpreted as the log-contrast of a composition that has the maximum correlation with a log-contrast of the other composition, under the restriction that they are uncorrelated with the first pair, and so on. Using properties from changes of basis, we prove that both canonical correlations and canonical variates are invariant to the choice of divisors in alr transformation. We show how to implement the analysis and interpret the results by means of an illustration from the social sciences field using data from Kolb's Learning Style Inventory and Boyatzis' Philosophical Orientation Questionnaire, which distribute a fixed total score among several learning modes and philosophical orientations.


Author(s):  
SHILIANG SUN ◽  
FENG JIN

Co-training is a multiview semi-supervised learning algorithm to learn from both labeled and unlabeled data, which iteratively adopts a classifier trained on one view to teach the other view using some confident predictions given on unlabeled examples. However, as it does not examine the reliability of the labels provided by classifiers on either view, co-training might be problematic. Even very few inaccurately labeled examples can deteriorate the performance of learned classifiers to a large extent. In this paper, a new method named robust co-training is proposed, which integrates canonical correlation analysis (CCA) to inspect the predictions of co-training on those unlabeled training examples. CCA is applied to obtain a low-dimensional and closely correlated representation of the original multiview data. Based on this representation the similarities between an unlabeled example and the original labeled examples are determined. Only those examples whose predicted labels are consistent with the outcome of CCA examination are eligible to augment the original labeled data. The performance of robust co-training is evaluated on several different classification problems where encouraging experimental results are observed.


1985 ◽  
Vol 24 (02) ◽  
pp. 91-100 ◽  
Author(s):  
W. van Pelt ◽  
Ph. H. Quanjer ◽  
M. E. Wise ◽  
E. van der Burg ◽  
R. van der Lende

SummaryAs part of a population study on chronic lung disease in the Netherlands, an investigation is made of the relationship of both age and sex with indices describing the maximum expiratory flow-volume (MEFV) curve. To determine the relationship, non-linear canonical correlation was used as realized in the computer program CANALS, a combination of ordinary canonical correlation analysis (CCA) and non-linear transformations of the variables. This method enhances the generality of the relationship to be found and has the advantage of showing the relative importance of categories or ranges within a variable with respect to that relationship. The above is exemplified by describing the relationship of age and sex with variables concerning respiratory symptoms and smoking habits. The analysis of age and sex with MEFV curve indices shows that non-linear canonical correlation analysis is an efficient tool in analysing size and shape of the MEFV curve and can be used to derive parameters concerning the whole curve.


Sign in / Sign up

Export Citation Format

Share Document