scholarly journals A Topical Category-Aware Neural Text Summarizer

2020 ◽  
Vol 10 (16) ◽  
pp. 5422 ◽  
Author(s):  
So-Eon Kim ◽  
Nazira Kaibalina ◽  
Seong-Bae Park

The advent of the sequence-to-sequence model and the attention mechanism has increased the comprehension and readability of automatically generated summaries. However, most previous studies on text summarization have focused on generating or extracting sentences only from an original text, even though every text has a latent topic category. That is, even if a topic category helps improve the summarization quality, there have been no efforts to utilize such information in text summarization. Therefore, this paper proposes a novel topical category-aware neural text summarizer which is differentiated from legacy neural summarizers in that it reflects the topic category of an original text into generating a summary. The proposed summarizer adopts the class activation map (CAM) as topical influence of the words in the original text. Since the CAM excerpts the words relevant to a specific category from the text, it allows the attention mechanism to be influenced by the topic category. As a result, the proposed neural summarizer reflects the topical information of a text as well as the content information into a summary by combining the attention mechanism and CAM. The experiments on The New York Times Annotated Corpus show that the proposed model outperforms the legacy attention-based sequence-to-sequence model, which proves that it is effective at reflecting a topic category into automatic summarization.

2015 ◽  
Vol 5 (1) ◽  
pp. 36-47 ◽  
Author(s):  
Rasmita Rautray ◽  
Rakesh Chandra Balabantaray ◽  
Anisha Bhardwaj

Problem of exponential growth of information available electronically, there is an increasing demand for text summarization. Text summarization is the process of extracting the contents of the original text in a shorter form that provides useful information to the user. This paper presents a summarizer to produce summaries while reducing the redundant information and maximizing the summary relevancy. The proposed model takes several features into an account, including title feature, sentence weight, term weight, sentence position, inter sentence similarity, proper noun, thematic word and numerical data. The score of each feature for the model can be obtained from the document sets. However, the results of such models are evaluated to measure their performance based on F-score of extracted sentences at 20% compression rate on a C-50 data corpus. Experimental studies on C-50 data corpus, PSO summarizer show significantly better performance compared to other summarizer.


There is a growing requirement for the text summarization due to the difficulty of managing exponential increase of information accessible on the World Wide Web. Text summarization is a process to extract the contents in the original text to the shorter form which provides important information to the user. The summarizer presented in this paper produces the extractive summaries of Kannada text documents. The proposed summarizer system considers five features to determine the important sentences in the document. The features used are Term Frequency, Term Frequency-Inverse Sentence Frequency, Keywords feature, Sentence length and Sentence position. The value of each feature is computed and score for each sentence in the document is the average of all the feature score values. The sentences with the top scores are selected to be included in the extractive summary. The results of the proposed model are evaluated using ROUGE toolkit to measure the performance based on F-score of generated summaries. Experimental studies on custom-built dataset with 50 Kannada text documents shows significantly better performance in producing extractive summaries as compared to human summaries


2020 ◽  
Vol 8 (6) ◽  
pp. 5622-5627

The past decade has endorsed a great rise in Artificial Intelligence. Text summarization which comes under AI has been an important research area that identifies the relevant sentences from a piece of text. By Text Summarization, we can get short and precise information by preserving the contents of the text. This paper presents an approach for generating a short and precise extractive summary for the given document of text. A statistical method for extractive text summarization of sports articles using extraction of various features is discussed in this paper. The features taken are TFISF, Sentence Length, Sentence Position, Sentence to Sentence cohesion, Proper noun, Pronoun. Each sentence is given a score known as the predictive score is calculated and the summary for the given document of text is given based on the predictive score or also known as the rank of the sentence. The accuracy is checked using the BBC Sports Article dataset and sports articles of various newspapers like the New York Times, CNN. The precision of 73% is acquired when compared with System Generated Summary (SGS) and manual summary, on an average.


Author(s):  
Li Wang ◽  
Junlin Yao ◽  
Yunzhe Tao ◽  
Li Zhong ◽  
Wei Liu ◽  
...  

In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization.


2020 ◽  
Vol 10 (11) ◽  
pp. 3851
Author(s):  
Seongsik Park ◽  
Harksoo Kim

Relation extraction is a type of information extraction task that recognizes semantic relationships between entities in a sentence. Many previous studies have focused on extracting only one semantic relation between two entities in a single sentence. However, multiple entities in a sentence are associated through various relations. To address this issue, we proposed a relation extraction model based on a dual pointer network with a multi-head attention mechanism. The proposed model finds n-to-1 subject–object relations using a forward object decoder. Then, it finds 1-to-n subject–object relations using a backward subject decoder. Our experiments confirmed that the proposed model outperformed previous models, with an F1-score of 80.8% for the ACE (automatic content extraction) 2005 corpus and an F1-score of 78.3% for the NYT (New York Times) corpus.


Author(s):  
Dong Qiu ◽  
Bing Yang

AbstractExisting text summarization methods mainly rely on the mapping between manually labeled standard summaries and the original text for feature extraction, often ignoring the internal structure and semantic feature information of the original document. Therefore, the text summary extracted by the existing model has the problems of grammatical structure errors and semantic deviation from the original text. This paper attempts to enhance the model’s attention to the inherent feature information of the source text so that the model can more accurately identify the grammatical structure and semantic information of the document. Therefore, this paper proposes a model based on the multi-head self-attention mechanism and the soft attention mechanism. By introducing an improved multi-head self-attention mechanism in the model coding stage, the training model enables the correct summary syntax and semantic information to obtain higher weight, thereby making the generated summary more coherent and accurate. At the same time, the pointer network model is adopted, and the coverage mechanism is improved to solve out-of-vocabulary and repetitive problems when generating abstracts. This article uses CNN/DailyMail dataset to verify the model proposed in this article and uses the ROUGE indicator to evaluate the model. The experimental results show that the model in this article improves the quality of the generated summary compared with other models.


2003 ◽  
Vol 15 (3) ◽  
pp. 98-105 ◽  
Author(s):  
Mark Galliker ◽  
Jan Herman
Keyword(s):  
New York ◽  

Zusammenfassung. Am Beispiel der Repräsentation von Mann und Frau in der Times und in der New York Times wird ein inhaltsanalytisches Verfahren vorgestellt, das sich besonders für die Untersuchung elektronisch gespeicherter Printmedien eignet. Unter Co-Occurrence-Analyse wird die systematische Untersuchung verbaler Kombinationen pro Zähleinheit verstanden. Diskutiert wird das Problem der Auswahl der bei der Auswertung und Darstellung der Ergebnisse berücksichtigten semantischen Einheiten.


Sign in / Sign up

Export Citation Format

Share Document