Bi-Directional Co-Attention Network for Image Captioning

Author(s):  
Weitao Jiang ◽  
Weixuan Wang ◽  
Haifeng Hu

Image Captioning, which automatically describes an image with natural language, is regarded as a fundamental challenge in computer vision. In recent years, significant advance has been made in image captioning through improving attention mechanism. However, most existing methods construct attention mechanisms based on singular visual features, such as patch features or object features, which limits the accuracy of generated captions. In this article, we propose a Bidirectional Co-Attention Network (BCAN) that combines multiple visual features to provide information from different aspects. Different features are associated with predicting different words, and there are a priori relations between these multiple visual features. Based on this, we further propose a bottom-up and top-down bi-directional co-attention mechanism to extract discriminative attention information. Furthermore, most existing methods do not exploit an effective multimodal integration strategy, generally using addition or concatenation to combine features. To solve this problem, we adopt the Multivariate Residual Module (MRM) to integrate multimodal attention features. Meanwhile, we further propose a Vertical MRM to integrate features of the same category, and a Horizontal MRM to combine features of the different categories, which can balance the contribution of the bottom-up co-attention and the top-down co-attention. In contrast to the existing methods, the BCAN is able to obtain complementary information from multiple visual features via the bi-directional co-attention strategy, and integrate multimodal information via the improved multivariate residual strategy. We conduct a series of experiments on two benchmark datasets (MSCOCO and Flickr30k), and the results indicate that the proposed BCAN achieves the superior performance.

2009 ◽  
Vol 39 (12) ◽  
pp. 1935-1941 ◽  
Author(s):  
K. S. Kendler

This essay, which seeks to provide an historical framework for our efforts to develop a scientific psychiatric nosology, begins by reviewing the classificatory approaches that arose in the early history of biological taxonomy. Initial attempts at species definition used top-down approaches advocated by experts and based on a few essential features of the organism chosena priori. This approach was subsequently rejected on both conceptual and practical grounds and replaced by bottom-up approaches making use of a much wider array of features. Multiple parallels exist between the beginnings of biological taxonomy and psychiatric nosology. Like biological taxonomy, psychiatric nosology largely began with ‘expert’ classifications, typically influenced by a few essential features, articulated by one or more great 19th-century diagnosticians. Like biology, psychiatry is struggling toward more soundly based bottom-up approaches using diverse illness characteristics. The underemphasized historically contingent nature of our current psychiatric classification is illustrated by recounting the history of how ‘Schneiderian’ symptoms of schizophrenia entered into DSM-III. Given these historical contingencies, it is vital that our psychiatric nosologic enterprise be cumulative. This can be best achieved through a process of epistemic iteration. If we can develop a stable consensus in our theoretical orientation toward psychiatric illness, we can apply this approach, which has one crucial virtue. Regardless of the starting point, if each iteration (or revision) improves the performance of the nosology, the eventual success of the nosologic process, to optimally reflect the complex reality of psychiatric illness, is assured.


2006 ◽  
Vol 63 (7) ◽  
pp. 1536-1548 ◽  
Author(s):  
Paul D Eastwood ◽  
Sami Souissi ◽  
Stuart I Rogers ◽  
Roger A Coggan ◽  
Craig J Brown

Acoustic technologies yield many benefits for mapping the physical structure of seabed environments but are not ideally suited to classifying associated biological assemblages. We tested this assumption using benthic infauna data collected off the south coast of England by applying top-down (supervised) and bottom-up (unsupervised) classification approaches. The top-down approach was based on an a priori acoustic classification of the seabed followed by characterization of the acoustic regions using ground-truth biological samples. By contrast, measures of similarity between the ground-truth infaunal community data formed the basis of the bottom-up approach to assemblage classification. For both approaches, individual assemblages were mapped by first computing Bayesian conditional probabilities for ground-truth stations to estimate the probability of each station belonging to an assemblage. Assemblage distributions were then interpolated over a regular grid and characterized using an indicator value index. While the two methods of classification yielded assemblages and output maps that were broadly comparable, the bottom-up approach arrived at a slightly better defined set of biological assemblages. This suggests that acoustically derived seabed data are not ideally suited to class ifying biological assemblages over unconsolidated sediments, despite offering considerable advantages in providing rapid and low-cost assessments of seabed physical structure.


2008 ◽  
Vol 46 (7) ◽  
pp. 2033-2042 ◽  
Author(s):  
Annerose Engel ◽  
Michael Burke ◽  
Katja Fiehler ◽  
Siegfried Bien ◽  
Frank Rösler

2017 ◽  
Author(s):  
Peter Bergamaschi ◽  
Ute Karstens ◽  
Alistair J. Manning ◽  
Marielle Saunois ◽  
Aki Tsuruta ◽  
...  

Abstract. We present inverse modelling (top-down) estimates of European methane (CH4) emissions for 2006–2012 based on a new quality-controlled and harmonized in-situ data set from 18 European atmospheric monitoring stations. We applied an ensemble of seven inverse models and performed four inversion experiments, investigating the impact of different sets of stations and the use of a priori information on emissions. The inverse models infer total CH4 emissions of 26.7 (20.2–29.7) Tg CH4 yr−1 (mean, 10th and 90th percentiles from all inversions) for the EU-28 for 2006–2012 from the four inversion experiments. For comparison, total anthropogenic CH4 emissions reported to UNFCCC (bottom-up, based on statistical data and emissions factors) amount to only 21.3 Tg CH4 yr−1 (2006) to 18.8 Tg CH4 yr−1 (2012). A potential explanation for the higher range of top-down estimates compared to bottom-up inventories could be the contribution from natural sources, such as peatlands, wetlands, and wet soils. Based on seven different wetland inventories from the Wetland and Wetland CH4 Inter-comparison of Models Project (WETCHIMP) total wetland emissions of 4.3 (2.3–8.2) CH4 yr−1 from EU-28 are estimated. The hypothesis of significant natural emissions is supported by the finding that several inverse models yield significant seasonal cycles of derived CH4 emissions with maxima in summer, while anthropogenic CH4 emissions are assumed to have much lower seasonal variability. Furthermore, we investigate potential biases in the inverse models by comparison with regular aircraft profiles at four European sites and with vertical profiles obtained during the Infrastructure for Measurement of the European Carbon Cycle (IMECC) aircraft campaign. We present a novel approach to estimate the biases in the derived emissions, based on the comparison of simulated and measured enhancements of CH4 compared to the background, integrated over the entire boundary layer and over the lower troposphere. This analysis identifies regional biases for several models at the aircraft profile sites in France, Hungary and Poland.


2010 ◽  
Vol 10 (8) ◽  
pp. 19697-19736 ◽  
Author(s):  
G. Curci ◽  
P. I. Palmer ◽  
T. P. Kurosu ◽  
K. Chance ◽  
G. Visconti

Abstract. Emission of non-methane Volatile Organic Compounds (VOCs) to the atmosphere stems from biogenic and human activities, and their estimation is difficult because of the many and not fully understood processes involved. In order to narrow down the uncertainty related to VOC emissions, which negatively reflects on our ability to simulate the atmospheric composition, we exploit satellite observations of formaldehyde (HCHO), an ubiquitous oxidation product of most VOCs, focusing on Europe. HCHO column observations from the Ozone Monitoring Instrument (OMI) reveal a marked seasonal cycle with a summer maximum and winter minimum. In summer, the oxidation of methane and other long-lived VOCs supply a slowly varying background HCHO column, while HCHO variability is dominated by most reactive VOC, primarily biogenic isoprene followed in importance by biogenic terpenes and anthropogenic VOCs. The chemistry-transport model CHIMERE qualitatively reproduces the temporal and spatial features of the observed HCHO column, but display regional biases which are attributed mainly to incorrect biogenic VOC emissions, calculated with the Model of Emissions of Gases and Aerosol from Nature (MEGAN) algorithm. These "bottom-up" or a-priori emissions are corrected through a Bayesian inversion of the OMI HCHO observations. Resulting "top-down" or a-posteriori isoprene emissions are lower than "bottom-up" by 40% over the Balkans and by 20% over Southern Germany, and higher by 20% over Iberian Peninsula, Greece and Italy. The inversion is shown to be robust against assumptions on the a-priori and the inversion parameters. We conclude that OMI satellite observations of HCHO can provide a quantitative "top-down" constraint on the European "bottom-up" VOC inventories.


2010 ◽  
Vol 3 (2) ◽  
Author(s):  
Thomas Couronné ◽  
Anne Guérin-Dugué ◽  
Michel Dubois ◽  
Pauline Faye ◽  
Christian Marendaz

When people gaze at real scenes, their visual attention is driven both by a set of bottom-up processes coming from the signal properties of the scene and also from top-down effects such as the task, the affective state, prior knowledge, or the semantic context. The context of this study is an assessment of manufactured objects (here car cab interior). From this dedicated context, this work describes a set of methods to analyze the eye-movements during the visual scene evaluation. But these methods can be adapted to more general contexts. We define a statistical model to explain the eye fixations measured experimentally by eye-tracking even when the ratio signal/noise is bad or lacking of raw data. One of the novelties of the approach is to use complementary experimental data obtained with the “Bubbles” paradigm. The proposed model is an additive mixture of several a priori spatial density distributions of factors guiding visual attention. The “Bubbles” paradigm is adapted here to reveal the semantic density distribution which represents here the cumulative effects of the top-down factors. Then, the contribution of each factor is compared depending on the product and on the task, in order to highlight the properties of the visual attention and the cognitive activity in each situation.


Sign in / Sign up

Export Citation Format

Share Document