TDJEE: A Document-Level Joint Model for Financial Event Extraction

Peng Wang; Zhenkai Deng; Ruilong Cui

doi:10.3390/electronics10070824

TDJEE: A Document-Level Joint Model for Financial Event Extraction

Electronics ◽

10.3390/electronics10070824 ◽

2021 ◽

Vol 10 (7) ◽

pp. 824

Author(s):

Peng Wang ◽

Zhenkai Deng ◽

Ruilong Cui

Keyword(s):

Event Extraction ◽

Distant Supervision ◽

Sentence Level ◽

Financial Domain ◽

Level Information ◽

Financial Events ◽

Model Training ◽

Joint Event ◽

Extraction Model ◽

Document Level

Extracting financial events from numerous financial announcements is very important for investors to make right decisions. However, it is still challenging that event arguments always scatter in multiple sentences in a financial announcement, while most existing event extraction models only work in sentence-level scenarios. To address this problem, this paper proposes a relation-aware Transformer-based Document-level Joint Event Extraction model (TDJEE), which encodes relations between words into the context and leverages modified Transformer to capture document-level information to fill event arguments. Meanwhile, the absence of labeled data in financial domain could lead models be unstable in extraction results, which is known as the cold start problem. Furthermore, a Fonduer-based knowledge base combined with the distant supervision method is proposed to simplify the event labeling and provide high quality labeled training corpus for model training and evaluating. Experimental results on real-world Chinese financial announcement show that, compared with other models, TDJEE achieves competitive results and can effectively extract event arguments across multiple sentences.

Download Full-text

Joint Event Extraction Model based on Multi-feature Fusion

Procedia Computer Science ◽

10.1016/j.procs.2020.06.066 ◽

2020 ◽

Vol 174 ◽

pp. 115-122

Author(s):

Wang Shuo ◽

Rao Yuan ◽

Fan Xiaobing ◽

Qi Jiangnan

Keyword(s):

Feature Fusion ◽

Event Extraction ◽

Model Based ◽

Joint Event ◽

Extraction Model

Download Full-text

A Prior Information Enhanced Extraction Framework for Document-level Financial Event Extraction

Data Intelligence ◽

10.1162/dint_a_00103 ◽

2021 ◽

pp. 1-12

Author(s):

Haitao Wang ◽

Tong Zhu ◽

Mingtao Wang ◽

Guoliang Zhang ◽

Wenliang Chen

Keyword(s):

Event Detection ◽

Prior Information ◽

Event Extraction ◽

Language Models ◽

Event Type ◽

Event Argument ◽

Implicit Information ◽

Share Task ◽

Financial Domain ◽

Document Level

Abstract Document-level financial event extraction (DFEE) is the task of detecting event and extracting the corresponding event arguments in financial documents, which plays an important role in information extraction in the financial domain. This task is challenging as the financial documents are generally long text and event arguments of one event may be scattered in different sentences. To address this issue, we propose a novel Prior Information Enhanced Extraction framework (PIEE) for DFEE, leveraging prior information from both event types and pre-trained language models. Specifically, PIEE consists of three components: event detection, event argument extraction, and event table filling. In event detection, we identify the event type. Then, the event type is explicitly used for event argument extraction. Meanwhile, the implicit information within language models also provides considerable cues for event arguments localization. Finally, all the event arguments are filled in an event table by a set of predefined heuristic rules. To demonstrate the effectiveness of our proposed framework, we participate the share task of CCKS2020 Task5-2: Document-level Event Arguments Extraction. On both Leaderboard A and Leaderboard B, PIEE takes the first place and significantly outperforms the other systems.

Download Full-text

Event Geoparser with Pseudo-Location Entity Identification and Numerical Argument Extraction Implementation and Evaluation in Indonesian News Domain

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9120712 ◽

2020 ◽

Vol 9 (12) ◽

pp. 712

Author(s):

Agung Dewandaru ◽

Dwi Hendratmo Widyantoro ◽

Saiful Akbar

Keyword(s):

Topic Model ◽

Event Extraction ◽

Geographic Information Retrieval ◽

Unstructured Text ◽

Three Stages ◽

Entity Identification ◽

Choropleth Map ◽

Extraction Model ◽

Document Level ◽

Large Corpus

Geoparser is a fundamental component of a Geographic Information Retrieval (GIR) geoparser, which performs toponym recognition, disambiguation, and geographic coordinate resolution from unstructured text domain. However, geoparsing of news articles which report several events across many place-mentions in the document are not yet adequately handled by regular geoparser, where the scope of resolution is either toponym-level or document-level. The capacity to detect multiple events and geolocate their true coordinates along with their numerical arguments is still missing from modern geoparsers, much less in Indonesian news corpora domain. We propose an event geoparser model with three stages of processing, which tightly integrates event extraction model into geoparsing and provides precise event-level resolution scope. The model casts the geotagging and event extraction as sequence labeling and uses LSTM-CRF inferencer equipped with features derived using Aggregated Topic Model from a large corpus to increase the generalizability. Throughout the proposed workflow and features, the geoparser is able to significantly improve the identification of pseudo-location entities, resulting in a 23.43% increase for weighted F1 score compared to baseline gazetteer and POS Tag features. As a side effect of event extraction, various numerical arguments are also extracted, and the output is easily projected to a rich choropleth map from a single news document.

Download Full-text

Event Geoparser with Pseudo-Location Entity Identification and Numerical Extraction in Indonesian News Corpus

10.20944/preprints202008.0263.v1 ◽

2020 ◽

Author(s):

Agung Dewandaru ◽

Dwi Hendratmo Widyantoro ◽

Saiful Akbar

Keyword(s):

Topic Model ◽

Conditional Random Field ◽

Event Extraction ◽

News Story ◽

Geographic Information Retrieval ◽

Entity Identification ◽

News Corpus ◽

Choropleth Map ◽

Extraction Model ◽

Document Level

One of the most important component of a Geographic Information Retrieval (GIR) is the geoparser, which performs toponym recognition, disambiguation, and geographic coordinate resolution from unstructured text domain. However, news articles which report several events across many place references mentioned in the document is not yet adequately modeled by regular geoparser types where the scope of resolution is either on toponym-level or document-level. The capacity to detect multiple events, geolocate its true locations and coordinates along with their numerical arguments are still missing from modern geoparsers, much less in Indonesian news corpora domain. We propose a novel type event geoparser which integrates an ACE-based event extraction model and provides precise event-level scope resolution. The geoparser casts the geotagging and event extraction as sequence labeling and uses Conditional Random Field with keywords feature obtained using Aggregated Topic Model as a semantic exploration from large corpus, which eventually increases the generalizability of the model. The geoparser also use Smallest Administrative Level feature along with Spatial Minimality-derived algorithm to improve the identification of Pseudo-location entities, resulting 19.4% increase for weighted F1 score. As a side effect of event extraction, the geoparser also extracts various numerical arguments and able to generate thematic choropleth map from a single news story.

Download Full-text

A Maximum Entropy-Based Bio-Molecular Event Extraction Model that Considers Event Generation

Journal of Information Processing Systems ◽

10.3745/jips.04.0008 ◽

2014 ◽

Keyword(s):

Maximum Entropy ◽

Event Extraction ◽

Molecular Event ◽

Event Generation ◽

Extraction Model

Download Full-text

Applying Topic Segmentation to Document-Level Information Retrieval

Proceedings of the 14th Central and Eastern European Software Engineering Conference Russia on ZZZ - CEE-SECR '18 ◽

10.1145/3290621.3290630 ◽

2018 ◽

Author(s):

Gennady Shtekh ◽

Polina Kazakova ◽

Nikita Nikitinsky ◽

Nikolay Skachkov

Keyword(s):

Information Retrieval ◽

Topic Segmentation ◽

Level Information ◽

Document Level

Download Full-text

Joint Event Extraction with Hierarchical Policy Network

10.18653/v1/2020.coling-main.239 ◽

2020 ◽

Author(s):

Peixin Huang ◽

Xiang Zhao ◽

Ryuichi Takanobu ◽

Zhen Tan ◽

Weidong Xiao

Keyword(s):

Event Extraction ◽

Policy Network ◽

Joint Event

Download Full-text

Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0011 ◽

2017 ◽

Vol 108 (1) ◽

pp. 85-96 ◽

Cited By ~ 2

Author(s):

Eva Martínez Garcia ◽

Carles Creus ◽

Cristina España-Bonet ◽

Lluís Màrquez

Keyword(s):

Machine Translation ◽

Evaluation Metrics ◽

Automatic Evaluation ◽

Word Embeddings ◽

Standard Document ◽

Sentence Level ◽

Word Translation ◽

Stochastic Mechanism ◽

Document Level

Abstract We integrate new mechanisms in a document-level machine translation decoder to improve the lexical consistency of document translations. First, we develop a document-level feature designed to score the lexical consistency of a translation. This feature, which applies to words that have been translated into different forms within the document, uses word embeddings to measure the adequacy of each word translation given its context. Second, we extend the decoder with a new stochastic mechanism that, at translation time, allows to introduce changes in the translation oriented to improve its lexical consistency. We evaluate our system on English–Spanish document translation, and we conduct automatic and manual assessments of its quality. The automatic evaluation metrics, applied mainly at sentence level, do not reflect significant variations. On the contrary, the manual evaluation shows that the system dealing with lexical consistency is preferred over both a standard sentence-level and a standard document-level phrase-based MT systems.

Download Full-text

Joint Event Extraction Based on Skip-Window Convolutional Neural Networks

Natural Language Understanding and Intelligent Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-50496-4_27 ◽

2016 ◽

pp. 324-334 ◽

Cited By ~ 2

Author(s):

Zhengkuan Zhang ◽

Weiran Xu ◽

Qianqian Chen

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Event Extraction ◽

Joint Event

Download Full-text

Multi-language person social relation extraction model based on distant supervision

2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) ◽

10.1109/icccbda.2018.8386544 ◽

2018 ◽

Author(s):

Yangchen Huang ◽

Yan Jia ◽

Jiuming Huang ◽

Zhonghe He

Keyword(s):

Social Relation ◽

Relation Extraction ◽

Model Based ◽

Distant Supervision ◽

Extraction Model

Download Full-text