scholarly journals Analyzing and Controlling Inter-Head Diversity in Multi-Head Attention

2021 ◽  
Vol 11 (4) ◽  
pp. 1548
Author(s):  
Hyeongu Yun ◽  
Taegwan Kang ◽  
Kyomin Jung

Multi-head attention, a powerful strategy for Transformer, is assumed to utilize information from diverse representation subspaces. However, measuring diversity between heads’ representations or exploiting the diversity has been rarely studied. In this paper, we quantitatively analyze inter-head diversity of multi-head attention by applying recently developed similarity measures between two deep representations: Singular Vector Canonical Correlation Analysis (SVCCA) and Centered Kernel Alignment (CKA). By doing so, we empirically show that multi-head attention does diversify representation subspaces of each head as the number of heads increases. Based on our analysis, we hypothesize that there exists an optimal inter-head diversity with which a model can achieve better performance. To examine our hypothesis, we deeply inspect three techniques to control the inter-head diversity; (1) Hilbert-Schmidt Independence Criterion regularizer among representation subspaces, (2) Orthogonality regularizer, and (3) Drophead as zero-outing each head randomly in every training step. In our experiments on various machine translation and language modeling tasks, we show that controlling inter-head diversity leads to the best performance among baselines.

1985 ◽  
Vol 24 (02) ◽  
pp. 91-100 ◽  
Author(s):  
W. van Pelt ◽  
Ph. H. Quanjer ◽  
M. E. Wise ◽  
E. van der Burg ◽  
R. van der Lende

SummaryAs part of a population study on chronic lung disease in the Netherlands, an investigation is made of the relationship of both age and sex with indices describing the maximum expiratory flow-volume (MEFV) curve. To determine the relationship, non-linear canonical correlation was used as realized in the computer program CANALS, a combination of ordinary canonical correlation analysis (CCA) and non-linear transformations of the variables. This method enhances the generality of the relationship to be found and has the advantage of showing the relative importance of categories or ranges within a variable with respect to that relationship. The above is exemplified by describing the relationship of age and sex with variables concerning respiratory symptoms and smoking habits. The analysis of age and sex with MEFV curve indices shows that non-linear canonical correlation analysis is an efficient tool in analysing size and shape of the MEFV curve and can be used to derive parameters concerning the whole curve.


Sign in / Sign up

Export Citation Format

Share Document