Diverse Decoding for Abstractive Document Summarization

Xu-Wang Han; Hai-Tao Zheng; Jin-Yuan Chen; Cong-Zhi Zhao

doi:10.3390/app9030386

Diverse Decoding for Abstractive Document Summarization

Applied Sciences ◽

10.3390/app9030386 ◽

2019 ◽

Vol 9 (3) ◽

pp. 386 ◽

Cited By ~ 2

Author(s):

Xu-Wang Han ◽

Hai-Tao Zheng ◽

Jin-Yuan Chen ◽

Cong-Zhi Zhao

Keyword(s):

Experimental Evaluation ◽

State Of The Art ◽

Attention Mechanism ◽

Beam Search ◽

Daily Mail ◽

Document Summarization ◽

Novel Method ◽

Search Approach ◽

Abstractive Summarization ◽

Information Coverage

Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural Abstractive Summarization with Diverse Decoding (NASDD). This method augments the standard attentional sequence-to-sequence model in two aspects. First, we introduce a diversity-promoting beam search approach in the decoding process, which alleviates the serious diversity issue caused by standard beam search and hence increases the possibility of generating summary sequences that are more informative. Second, we creatively utilize the attention mechanism combined with the key information of the input document as an estimation of the salient information coverage, which aids in finding the optimal summary sequence. We carry out the experimental evaluation with state-of-the-art methods on the CNN/Daily Mail summarization dataset, and the results demonstrate the superiority of our proposed method.

Download Full-text

Multi-Document Summarization with Determinantal Point Process Attention

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12522 ◽

2021 ◽

Vol 71 ◽

pp. 371-399

Author(s):

Laura Perez-Beltrachini ◽

Mirella Lapata

Keyword(s):

Point Process ◽

Experimental Evaluation ◽

Point Processes ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Determinantal Point Processes ◽

Determinantal Point Process ◽

Document Summarization

The ability to convey relevant and diverse information is critical in multi-document summarization and yet remains elusive for neural seq-to-seq models whose outputs are often redundant and fail to correctly cover important details. In this work, we propose an attention mechanism which encourages greater focus on relevance and diversity. Attention weights are computed based on (proportional) probabilities given by Determinantal Point Processes (DPPs) defined on the set of content units to be summarized. DPPs have been successfully used in extractive summarisation, here we use them to select relevant and diverse content for neural abstractive summarisation. We integrate DPP-based attention with various seq-to-seq architectures ranging from CNNs to LSTMs, and Transformers. Experimental evaluation shows that our attention mechanism consistently improves summarization and delivers performance comparable with the state-of-the-art on the MultiNews dataset

Download Full-text

Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6259 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7594-7601

Author(s):

Pierre Colombo ◽

Emile Chapuis ◽

Matteo Manica ◽

Emmanuel Vignon ◽

Giovanna Varni ◽

...

Keyword(s):

Machine Translation ◽

Random Fields ◽

Conditional Random Fields ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Accuracy Score ◽

Beam Search ◽

Conversational Agents ◽

Neural Machine Translation

The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.

Download Full-text

DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016999 ◽

2019 ◽

Vol 33 ◽

pp. 6999-7006 ◽

Cited By ~ 1

Author(s):

Jiaxin Shi ◽

Chen Liang ◽

Lei Hou ◽

Juanzi Li ◽

Zhiyuan Liu ◽

...

Keyword(s):

Deep Neural Network ◽

State Of The Art ◽

Neural Model ◽

Training Set ◽

Daily Mail ◽

Test Set ◽

Document Summarization ◽

Training Strategy ◽

Strong Robustness ◽

Data Efficiency

We propose DeepChannel, a robust, data-efficient, and interpretable neural model for extractive document summarization. Given any document-summary pair, we estimate a salience score, which is modeled using an attention-based deep neural network, to represent the salience degree of the summary for yielding the document. We devise a contrastive training strategy to learn the salience estimation network, and then use the learned salience score as a guide and iteratively extract the most salient sentences from the document as our generated summary. In experiments, our model not only achieves state-of-the-art ROUGE scores on CNN/Daily Mail dataset, but also shows strong robustness in the out-of-domain test on DUC2007 test set. Moreover, our model reaches a ROUGE-1 F-1 score of 39.41 on CNN/Daily Mail test set with merely 1/100 training set, demonstrating a tremendous data efficiency.

Download Full-text

Skeleton to Abstraction: An Attentive Information Extraction Schema for Enhancing the Saliency of Text Summarization

Information ◽

10.3390/info9090217 ◽

2018 ◽

Vol 9 (9) ◽

pp. 217 ◽

Cited By ~ 1

Author(s):

Xiujuan Xiang ◽

Guangluan Xu ◽

Xingyu Fu ◽

Yang Wei ◽

Li Jin ◽

...

Keyword(s):

Information Extraction ◽

Full Text ◽

State Of The Art ◽

Irrelevant Information ◽

Source Text ◽

Daily Mail ◽

Human Evaluation ◽

Proposed Model ◽

Abstractive Summarization ◽

Extraction Model

Current popular abstractive summarization is based on an attentional encoder-decoder framework. Based on the architecture, the decoder generates a summary according to the full text that often results in the decoder being interfered by some irrelevant information, thereby causing the generated summaries to suffer from low saliency. Besides, we have observed the process of people writing summaries and find that they write a summary based on the necessary information rather than the full text. Thus, in order to enhance the saliency of the abstractive summarization, we propose an attentive information extraction model. It consists of a multi-layer perceptron (MLP) gated unit that pays more attention to the important information of the source text and a similarity module to encourage high similarity between the reference summary and the important information. Before the summary decoder, the MLP and the similarity module work together to extract the important information for the decoder, thus obtaining the skeleton of the source text. This effectively reduces the interference of irrelevant information to the decoder, therefore improving the saliency of the summary. Our proposed model was tested on CNN/Daily Mail and DUC-2004 datasets, and achieved a 42.01 ROUGE-1 f-score and 33.94 ROUGE-1, recall respectively. The result outperforms the state-of-the-art abstractive model on the same dataset. In addition, by subjective human evaluation, the saliency of the generated summaries was further enhanced.

Download Full-text

A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/619 ◽

2018 ◽

Cited By ~ 17

Author(s):

Li Wang ◽

Junlin Yao ◽

Yunzhe Tao ◽

Li Zhong ◽

Wei Liu ◽

...

Keyword(s):

Deep Learning ◽

Experimental Evaluation ◽

State Of The Art ◽

Text Summarization ◽

The Other ◽

Learning Approach ◽

Automatic Summarization ◽

Word Level ◽

Proposed Model ◽

Abstractive Summarization

In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization.

Download Full-text

Using Contextual Topic Model for a Query-Focused Multi-Document Summarizer

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016600022 ◽

2016 ◽

Vol 25 (01) ◽

pp. 1660002 ◽

Cited By ~ 5

Author(s):

Guangbing Yang

Keyword(s):

Topic Modeling ◽

Information Overload ◽

Topic Model ◽

State Of The Art ◽

Practical Application ◽

Document Summarization ◽

Bayesian Hierarchical ◽

Novel Approach ◽

Novel Method ◽

Latent Topics

Oft-decried information overload is a serious problem that negatively impacts the comprehension of information in the digital age. Text summarization is a helpful process that can be used to alleviate this problem. With the aim of seeking a novel method to enhance the performance of multi-document summarization, this study proposes a novel approach to analyze the problem of multi-document summarization based on a mixture model, consisting of a contextual topic model from a Bayesian hierarchical topic modeling family for selecting candidate summary sentences, and a regression model in machine learning for generating the summary. By investigating hierarchical topics and their correlations with respect to the lexical co-occurrences of words, the proposed contextual topic model can determine the relevance of sentences more effectively, recognize latent topics, and arrange them hierarchically. The quantitative evaluation results from a practical application demonstrates that a system implementing this model can significantly improve the performance of summarization and make it comparable to state-of-the-art summarization systems.

Download Full-text

Content-Based Attention Network for Person Image Generation

Journal of Circuits System and Computers ◽

10.1142/s0218126620502503 ◽

2020 ◽

Vol 29 (15) ◽

pp. 2050250

Author(s):

Xiongfei Liu ◽

Bengao Li ◽

Xin Chen ◽

Haiyan Zhang ◽

Shu Zhan

Keyword(s):

Major Part ◽

State Of The Art ◽

Attention Mechanism ◽

Experimental Results ◽

Generative Adversarial Networks ◽

Image Generation ◽

Attention Network ◽

Adversarial Networks ◽

Proposed Model ◽

Novel Method

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.

Download Full-text

Conversational Chatbot with Attention Model

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b6316.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3537-3540

Keyword(s):

Artificial Intelligence ◽

Natural Language ◽

Machine Translation ◽

State Of The Art ◽

Attention Mechanism ◽

Fine Tuning ◽

Beam Search ◽

Attention Model ◽

Transformer Model ◽

Efficient Sequence

A Chatbot is an Artificial Intelligence (AI) software that can give a simulation of a conversation between two humans. This Chatbot is based on State of the Art Transformer model architecture which works on Attention mechanism. The transformer model is a very efficient Sequence to Sequence model. Machine translation is at its core , simply a task in which you map the sentence to another sentence. Sentences consist of words that are equivalent to mapping to a different sequence. Beam search and Byte-pair encoding are the algorithms used in our model for heuristic searching in decoder units. A combination of many Unsupervised prediction tasks were carried out by fine-tuning using a multi-task objective every time the user starts the conversation. It takes a new persona for every new session opened and communicates with that persona which is chosen at random. Forwarding the perplexity by the ability to understand and generate natural language this model gives a whooping Hits@1 score efficiency as high as 80.9 percentage.

Download Full-text

An Adaptive Hierarchical Compositional Model for Phrase Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/576 ◽

2018 ◽

Author(s):

Bing Li ◽

Xiaochun Yang ◽

Bin Wang ◽

Wei Wang ◽

Wei Cui ◽

...

Keyword(s):

Vector Space ◽

Experimental Evaluation ◽

State Of The Art ◽

Model Complexity ◽

Semantic Structure ◽

Time Cost ◽

Compositional Model ◽

Novel Method ◽

Adaptively Adjusting ◽

Insight Into

Phrase embedding aims at representing phrases in a vector space and it is important for the performance of many NLP tasks. Existing models only regard a phrase as either full-compositional or non-compositional, while ignoring the hybrid-compositionality that widely exists, especially in long phrases. This drawback prevents them from having a deeper insight into the semantic structure for long phrases and as a consequence, weakens the accuracy of the embeddings. In this paper, we present a novel method for jointly learning compositionality and phrase embedding by adaptively weighting different compositions using an implicit hierarchical structure. Our model has the ability of adaptively adjusting among different compositions without entailing too much model complexity and time cost. To the best of our knowledge, our work is the first effort that considers hybrid-compositionality in phrase embedding. The experimental evaluation demonstrates that our model outperforms state-of-the-art methods in both similarity tasks and analogy tasks.

Download Full-text

Document Summarization Based on Coverage with Noise Injection and Word Association

Information ◽

10.3390/info11110536 ◽

2020 ◽

Vol 11 (11) ◽

pp. 536

Author(s):

Heechan Kim ◽

Soowon Lee

Keyword(s):

Language Processing ◽

Word Association ◽

State Of The Art ◽

The State ◽

Daily Mail ◽

Document Summarization ◽

The Third ◽

Word Sequence ◽

Noise Injection ◽

Automatic Document Summarization

Automatic document summarization is a field of natural language processing that is rapidly improving with the development of end-to-end deep learning models. In this paper, we propose a novel summarization model that consists of three methods. The first is a coverage method based on noise injection that makes the attention mechanism select only important words by defining previous context information as noise. This alleviates the problem that the summarization model generates the same word sequence repeatedly. The second is a word association method to update the information of each word by comparing the information of the current step with the information of all previous decoding steps. According to following words, this catches a change in the meaning of the word that has been already decoded. The third is a method using a suppression loss function that explicitly minimizes the probabilities of non-answer words. The proposed summarization model showed good performance on some recall-oriented understudy for gisting evaluation (ROUGE) metrics compared to the state-of-the-art models in the CNN/Daily Mail summarization task, and the results were achieved with very few learning steps compared to the state-of-the-art models.

Download Full-text