A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning

Qicai Wang; Peiyu Liu; Zhenfang Zhu; Hongxia Yin; Qiuyue Zhang; Lindong Zhang

doi:10.3390/app9214701

A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning

Applied Sciences ◽

10.3390/app9214701 ◽

2019 ◽

Vol 9 (21) ◽

pp. 4701 ◽

Cited By ~ 8

Author(s):

Qicai Wang ◽

Peiyu Liu ◽

Zhenfang Zhu ◽

Hongxia Yin ◽

Qiuyue Zhang ◽

...

Keyword(s):

Reinforcement Learning ◽

Language Processing ◽

Evaluation Method ◽

Ground Truth ◽

Text Summarization ◽

Word Embedding ◽

Text Representation ◽

Daily Mail ◽

Automatic Text Summarization ◽

Automatic Text

As a core task of natural language processing and information retrieval, automatic text summarization is widely applied in many fields. There are two existing methods for text summarization task at present: abstractive and extractive. On this basis we propose a novel hybrid model of extractive-abstractive to combine BERT (Bidirectional Encoder Representations from Transformers) word embedding with reinforcement learning. Firstly, we convert the human-written abstractive summaries to the ground truth labels. Secondly, we use BERT word embedding as text representation and pre-train two sub-models respectively. Finally, the extraction network and the abstraction network are bridged by reinforcement learning. To verify the performance of the model, we compare it with the current popular automatic text summary model on the CNN/Daily Mail dataset, and use the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics as the evaluation method. Extensive experimental results show that the accuracy of the model is improved obviously.

Download Full-text

A Pointer Generator Network Model to Automatic Text Summarization and Headline Generation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1094.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 447-451

Keyword(s):

Neural Network ◽

Network Model ◽

Recurrent Neural Network ◽

Text Summarization ◽

Daily Mail ◽

Automatic Text Summarization ◽

Generator Model ◽

Abstractive Summarization ◽

Automatic Text

In a world where information is growing rapidly every single day, we need tools to generate summary and headlines from text which is accurate as well as short and precise. In this paper, we have described a method for generating headlines from article. This is done by using hybrid pointer-generator network with attention distribution and coverage mechanism on article which generates abstractive summarization followed by the application of encoder-decoder recurrent neural network with LSTM unit to generate headlines from the summary. Hybrid pointer generator model helps in removing inaccuracy as well as repetitions. We have used CNN / Daily Mail as our dataset.

Download Full-text

Automatic Text Summarization and Keyword Extraction using Natural Language Processing

2020 International Conference on Electronics and Sustainable Communication Systems (ICESC) ◽

10.1109/icesc48915.2020.9155852 ◽

2020 ◽

Author(s):

Avinash Payak ◽

Saurabh Rai ◽

Kanishka Shrivastava ◽

Reshma Gulwani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Keyword Extraction ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Document Summarization with VHTM: Variational Hierarchical Topic-Aware Mechanism

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6277 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7740-7747 ◽

Cited By ~ 1

Author(s):

Xiyan Fu ◽

Jun Wang ◽

Jinghan Zhang ◽

Jinmao Wei ◽

Zhenglu Yang

Keyword(s):

Language Processing ◽

Topic Model ◽

Research Field ◽

Text Summarization ◽

Superior Performance ◽

The Past ◽

Automatic Text Summarization ◽

Latent Topics ◽

Further Development ◽

Automatic Text

Automatic text summarization focuses on distilling summary information from texts. This research field has been considerably explored over the past decades because of its significant role in many natural language processing tasks; however, two challenging issues block its further development: (1) how to yield a summarization model embedding topic inference rather than extending with a pre-trained one and (2) how to merge the latent topics into diverse granularity levels. In this study, we propose a variational hierarchical model to holistically address both issues, dubbed VHTM. Different from the previous work assisted by a pre-trained single-grained topic model, VHTM is the first attempt to jointly accomplish summarization with topic inference via variational encoder-decoder and merge topics into multi-grained levels through topic embedding and attention. Comprehensive experiments validate the superior performance of VHTM compared with the baselines, accompanying with semantically consistent topics.

Download Full-text

Automatic Text Summarization from Unstructured Text using Natural Language Processing

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/206922020 ◽

2020 ◽

Vol 9 (2) ◽

pp. 2265-2269

Author(s):

Mamta Aswani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Automatic Text Summarization ◽

Unstructured Text ◽

Automatic Text

Download Full-text

A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

Information ◽

10.3390/info11020078 ◽

2020 ◽

Vol 11 (2) ◽

pp. 78 ◽

Cited By ~ 2

Author(s):

Tulu Tilahun Hailu ◽

Junqing Yu ◽

Tessfu Geteye Fantaye

Keyword(s):

Text Summarization ◽

Evaluation Framework ◽

Word Embedding ◽

Evaluation Metrics ◽

Original Text ◽

Automatic Evaluation ◽

Source Text ◽

Automatic Text Summarization ◽

Automatic Text

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Download Full-text

A SYNTACTIC-BASED SENTENCE VALIDATION TECHNIQUE FOR MALAY TEXT SUMMARIZER

Journal of Information and Communication Technology ◽

10.32890/jict2021.20.3.3 ◽

2021 ◽

Vol 20 (Number 3) ◽

pp. 329-352

Author(s):

Suraya Alias ◽

Mohd Shamrie Sainin ◽

Siti Khaotijah Mohammad

Keyword(s):

Language Processing ◽

Text Summarization ◽

Compression Rate ◽

Automatic Evaluation ◽

Readability Score ◽

Automatic Text Summarization ◽

Validation Technique ◽

Automatic Text ◽

F Measure

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.

Download Full-text

Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i2.30324 ◽

2020 ◽

Vol 9 (2) ◽

pp. 342

Author(s):

Amal Alkhudari

Keyword(s):

Language Processing ◽

Automatic System ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Arabic Text ◽

Wide Spread ◽

New Approach ◽

Automatic Text Summarization ◽

Automatic Text

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory, because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.

Download Full-text

Techniques and Issues in Text Mining

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9079 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4368-4374

Author(s):

Perpetua F. Noronha ◽

Madhu Bhan

Keyword(s):

Language Processing ◽

Text Summarization ◽

Digital Data ◽

Text Documents ◽

Significant Information ◽

Digital Era ◽

Automatic Text Summarization ◽

Text Content ◽

Available Information ◽

Automatic Text

Digital data in huge amount is being persistently generated at an unparalleled and exponential rate. In this digital era where internet stands the prime source for generating incredible information, it is vital to develop better means to mine the available information rapidly and most capably. Manual extraction of the salient information from the large input text documents is a time consuming and inefficient task. In this fast-moving world, it is difficult to read all the text-content and derive insights from it. Automatic methods are required. The task of probing for relevant documents from the large number of sources available, and consuming apt information from it is a challenging task and is need of the hour. Automatic text summarization technique can be used to generate relevant and quality information in less time. Text Summarization is used to condense the source text into a brief summary maintaining its salient information and readability. Generating summaries automatically is in great demand to attend to the growing and increasing amount of text data that is obtainable online in order to mark out the significant information and to consume it faster. Text summarization is becoming extremely popular with the advancement in Natural Language Processing (NLP) and deep learning methods. The most important gain of automatic text summarization is, it reduces the analysis time. In this paper we focus on key approaches to automatic text summarization and also about their efficiency and limitations.

Download Full-text

Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v7i4.3746 ◽

2016 ◽

Vol 7 (4) ◽

pp. 285 ◽

Cited By ~ 14

Author(s):

Hans Christian ◽

Mikhael Pramodana Agus ◽

Derwin Suhartono

Keyword(s):

Language Processing ◽

Text Summarization ◽

The Other ◽

Online Information ◽

Inverse Document Frequency ◽

Automatic Text Summarization ◽

Document Frequency ◽

Online Source ◽

Automatic Text ◽

F Measure

The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.

Download Full-text

A Survey of Distinctive Prominence of Automatic Text Summarization Techniques Using Natural Language Processing

International Conference on Mobile Computing and Sustainable Informatics - EAI/Springer Innovations in Communication and Computing ◽

10.1007/978-3-030-49795-8_52 ◽

2020 ◽

pp. 543-549

Author(s):

Apurva D. Dhawale ◽

Sonali B. Kulkarni ◽

Vaishali M. Kumbhakarna

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text