Text Summarization of Czech News Articles Using Named Entities

Text summarization is a natural language processing application that is being researched extensively and applied further to reduce the processing time for various long-winded text-based activities. However, as NLP is still in its budding phase, the work is relatively limited to the English language, leaving regional languages rather untouched despite having an incredible following of speakers. Such is also the case with the Hindi language. In this paper, we propose to come up with an effective method of summarisation news articles in the Hindi language. Like the English variant of this application, we wish to emphasize on the important sections of a Hindi news report and summarize it within 60 to 80 words. The summarization technique will try to identify the theme of the news, named entities and numbers, title terms, etc., for constructing a keyword table. This will be further compared against a knowledge base with weighted keywords for ranking the important sentences in the relevant order and finally picking out the sentence most needed for the summary. Our goal for summarizing the Hindi news articles specifically roots from the dilemma that despite these articles being a rich source of opinionated information about various topics, they are often ignored by the readers because of their long-winded nature that makes the useful information lost in the sea of words decorated by winded introductions and linguistic ornaments like idioms. Hence, this system should enable in an effective means of summary for finding useful information along with pruning all such irrelevant details.

Download Full-text

Extractive Technique for Text Summarization based on Ranking Scheme

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i4.369373 ◽

2018 ◽

Vol 6 (4) ◽

pp. 369-373

Author(s):

A.A. Shrivastava ◽

◽

A.S. Bagora ◽

...

Keyword(s):

Text Summarization ◽

Ranking Scheme

Download Full-text

Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration - NEWS '09

10.3115/1699705 ◽

2009 ◽

Keyword(s):

Shared Task ◽

Named Entities

Download Full-text

Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities

10.3115/v1/p15-1058 ◽

2015 ◽

Cited By ~ 4

Author(s):

Dirk Weissenborn ◽

Leonhard Hennig ◽

Feiyu Xu ◽

Hans Uszkoreit

Keyword(s):

Multi Objective Optimization ◽

Named Entities ◽

Multi Objective

Download Full-text

Using Lexical Chains for Efficient Text Summarization

SSRN Electronic Journal ◽

10.2139/ssrn.3378072 ◽

2009 ◽

Author(s):

Anthony-Claret Onwutalobi

Keyword(s):

Text Summarization ◽

Lexical Chains

Download Full-text

Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition

10.21437/interspeech.2019-2962 ◽

2019 ◽

Author(s):

Jack Serrino ◽

Leonid Velikovich ◽

Petar Aleksic ◽

Cyril Allauzen

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Named Entities

Download Full-text

An Automatic Text Summarization Method with the Concern of Covering Complete Formation

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190716105347 ◽

2020 ◽

Vol 13 (5) ◽

pp. 977-986

Author(s):

Srinivasa Rao Kongara ◽

Dasika Sree Rama Chandra Murthy ◽

Gangadhara Rao Kancherla

Keyword(s):

Research Method ◽

Research Work ◽

Fuzzy Rule ◽

Text Summarization ◽

Document Summarization ◽

Summarization Method ◽

Overall Evaluation ◽

Multiple Documents ◽

Rule System ◽

Value Decomposition

Background: Text summarization is the process of generating a short description of the entire document which is more difficult to read. This method provides a convenient way of extracting the most useful information and a short summary of the documents. In the existing research work, this is focused by introducing the Fuzzy Rule-based Automated Summarization Method (FRASM). Existing work tends to have various limitations which might limit its applicability to the various real-world applications. The existing method is only suitable for the single document summarization where various applications such as research industries tend to summarize information from multiple documents. Methods: This paper proposed Multi-document Automated Summarization Method (MDASM) to introduce the summarization framework which would result in the accurate summarized outcome from the multiple documents. In this work, multi-document summarization is performed whereas in the existing system only single document summarization was performed. Initially document clustering is performed using modified k means cluster algorithm to group the similar kind of documents that provides the same meaning. This is identified by measuring the frequent term measurement. After clustering, pre-processing is performed by introducing the Hybrid TF-IDF and Singular value decomposition technique which would eliminate the irrelevant content and would result in the required content. Then sentence measurement is one by introducing the additional metrics namely Title measurement in addition to the existing work metrics to accurately retrieve the sentences with more similarity. Finally, a fuzzy rule system is applied to perform text summarization. Results: The overall evaluation of the research work is conducted in the MatLab simulation environment from which it is proved that the proposed research method ensures the optimal outcome than the existing research method in terms of accurate summarization. MDASM produces 89.28% increased accuracy, 89.28% increased precision, 89.36% increased recall value and 70% increased the f-measure value which performs better than FRASM. Conclusion: The summarization processes carried out in this work provides the accurate summarized outcome.

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

A bootstrapping approach for robust topic analysis

Natural Language Engineering ◽

10.1017/s1351324902002929 ◽

2002 ◽

Vol 8 (2-3) ◽

pp. 209-233 ◽

Cited By ~ 1

Author(s):

OLIVIER FERRET ◽

BRIGITTE GRAU

Keyword(s):

Information Extraction ◽

Large Scale ◽

Text Summarization ◽

Great Precision ◽

Topic Analysis ◽

Structured Knowledge

Topic analysis is important for many applications dealing with texts, such as text summarization or information extraction. However, it can be done with great precision only if it relies on structured knowledge, which is difficult to produce on a large scale. In this paper, we propose using bootstrapping to solve this problem: a first topic analysis based on a weakly structured source of knowledge, a collocation network, is used for learning explicit topic representations that then support a more precise and reliable topic analysis.

Download Full-text

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Data ◽

10.3390/data6070071 ◽

2021 ◽

Vol 6 (7) ◽

pp. 71

Author(s):

Gonçalo Carnaz ◽

Mário Antunes ◽

Vitor Beires Nogueira

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Identification ◽

Named Entities ◽

Related Data ◽

Named Entity ◽

Chain Of Custody ◽

Evidence Collection

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

Download Full-text