Building and Evaluating an Annotated Corpus for Automated Recognition of Chat-Based Social Engineering Attacks

Nikolaos Tsinganos; Ioannis Mavridis

doi:10.3390/app112210871

Building and Evaluating an Annotated Corpus for Automated Recognition of Chat-Based Social Engineering Attacks

Applied Sciences ◽

10.3390/app112210871 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10871

Author(s):

Nikolaos Tsinganos ◽

Ioannis Mavridis

Keyword(s):

Short Term Memory ◽

Early Stage ◽

Named Entity Recognition ◽

Social Engineering ◽

Cyber Attacks ◽

Entity Recognition ◽

Automated Recognition ◽

Named Entity ◽

Key Factor ◽

Context Features

Chat-based Social Engineering (CSE) is widely recognized as a key factor to successful cyber-attacks, especially in small and medium-sized enterprise (SME) environments. Despite the interest in preventing CSE attacks, few studies have considered the specific features of the language used by the attackers. This work contributes to the area of early-stage automated CSE attack recognition by proposing an approach for building and annotating a specific-purpose corpus and presenting its application in the CSE domain. The resulting CSE corpus is then evaluated by training a bi-directional long short-term memory (bi-LSTM) neural network for the purpose of named entity recognition (NER). The results of this study emphasize the importance of adding a plethora of metadata to a dataset to provide critical in-context features and produce a corpus that broadens our understanding of the tactics used by social engineers. The outcomes can be applied to dedicated cyber-defence mechanisms utilized to protect SME employees using Electronic Medium Communication (EMC) software.

Download Full-text

GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016236 ◽

2019 ◽

Vol 33 ◽

pp. 6236-6243 ◽

Cited By ~ 9

Author(s):

Hui Chen ◽

Zijia Lin ◽

Guiguang Ding ◽

Jianguang Lou ◽

Yusen Zhang ◽

...

Keyword(s):

Neural Networks ◽

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Local Context ◽

Named Entity ◽

Entire Sentence ◽

Context Features ◽

Gpu Parallelism

The dominant approaches for named entity recognitionm (NER) mostly adopt complex recurrent neural networks (RNN), e.g., long-short-term-memory (LSTM). However, RNNs are limited by their recurrent nature in terms of computational efficiency. In contrast, convolutional neural networks (CNN) can fully exploit the GPU parallelism with their feedforward architectures. However, little attention has been paid to performing NER with CNNs, mainly owing to their difficulties in capturing the long-term context information in a sequence. In this paper, we propose a simple but effective CNN-based network for NER, i.e., gated relation network (GRN), which is more capable than common CNNs in capturing long-term context. Specifically, in GRN we firstly employ CNNs to explore the local context features of each word. Then we model the relations between words and use them as gates to fuse local context features into global ones for predicting labels. Without using recurrent layers that process a sentence in a sequential manner, our GRN allows computations to be performed in parallel across the entire sentence. Experiments on two benchmark NER datasets (i.e., CoNLL2003 and Ontonotes 5.0) show that, our proposed GRN can achieve state-of-the-art performance with or without external knowledge. It also enjoys lower time costs to train and test.

Download Full-text

Bidirectional Long Short-Term Memory (BILSTM) with Conditional Random Fields (CRF) for Knowledge Named Entity Recognition in Online Judges (OJS)

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2018.7401 ◽

2018 ◽

Vol 7 (4) ◽

pp. 01-08

Author(s):

Muhammad Asif Khan ◽

Tayyab Naveed ◽

Elmaam Yagoub ◽

Guojin Zhu

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Download Full-text

Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields

Tsinghua Science & Technology ◽

10.26599/tst.2019.9010033 ◽

2021 ◽

Vol 26 (3) ◽

pp. 259-265

Author(s):

Pingchuan Ma ◽

Bo Jiang ◽

Zhigang Lu ◽

Ning Li ◽

Zhengwei Jiang

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Download Full-text

Attention-Based Bidirectional Long Short-Term Memory Networks for Chinese Named Entity Recognition

Proceedings of the 2019 4th International Conference on Machine Learning Technologies - ICMLT 2019 ◽

10.1145/3340997.3341002 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chaoyi Huang ◽

Youguang Chen ◽

Qiancheng Liang

Keyword(s):

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Download Full-text

Combined Self-Attention Mechanism for Chinese Named Entity Recognition in Military

Future Internet ◽

10.3390/fi11080180 ◽

2019 ◽

Vol 11 (8) ◽

pp. 180

Author(s):

Fei Liao ◽

Liangli Ma ◽

Jingjing Pei ◽

Linshan Tan

Keyword(s):

Domain Knowledge ◽

Short Term Memory ◽

Contextual Information ◽

Named Entity Recognition ◽

Attention Mechanism ◽

The Self ◽

Entity Recognition ◽

Named Entity ◽

Vector Representations ◽

The Military

Military named entity recognition (MNER) is one of the key technologies in military information extraction. Traditional methods for the MNER task rely on cumbersome feature engineering and specialized domain knowledge. In order to solve this problem, we propose a method employing a bidirectional long short-term memory (BiLSTM) neural network with a self-attention mechanism to identify the military entities automatically. We obtain distributed vector representations of the military corpus by unsupervised learning and the BiLSTM model combined with the self-attention mechanism is adopted to capture contextual information fully carried by the character vector sequence. The experimental results show that the self-attention mechanism can improve effectively the performance of MNER task. The F-score of the military documents and network military texts identification was 90.15% and 89.34%, respectively, which was better than other models.

Download Full-text

Character convolutions for Arabic Named Entity Recognition with Long Short-Term Memory Networks

Computer Speech & Language ◽

10.1016/j.csl.2019.05.003 ◽

2019 ◽

Vol 58 ◽

pp. 335-346 ◽

Cited By ~ 4

Author(s):

Muhammad Khalifa ◽

Khaled Shaalan

Keyword(s):

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Download Full-text

Innovative Deep Neural Network Modeling for Fine-Grained Chinese Entity Recognition

Electronics ◽

10.3390/electronics9061001 ◽

2020 ◽

Vol 9 (6) ◽

pp. 1001 ◽

Cited By ~ 1

Author(s):

Jingang Liu ◽

Chunhe Xia ◽

Haihua Yan ◽

Wenjing Xu

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Named Entity Recognition ◽

Training Model ◽

Entity Recognition ◽

Coarse Grained ◽

Neural Network Modeling ◽

Fine Grained ◽

Named Entity

Named entity recognition (NER) is a basic but crucial task in the field of natural language processing (NLP) and big data analysis. The recognition of named entities based on Chinese is more complicated and difficult than English, which makes the task of NER in Chinese more challenging. In particular, fine-grained named entity recognition is more challenging than traditional named entity recognition tasks, mainly because fine-grained tasks have higher requirements for the ability of automatic feature extraction and information representation of deep neural models. In this paper, we propose an innovative neural network model named En2BiLSTM-CRF to improve the effect of fine-grained Chinese entity recognition tasks. This proposed model including the initial encoding layer, the enhanced encoding layer, and the decoding layer combines the advantages of pre-training model encoding, dual bidirectional long short-term memory (BiLSTM) networks, and a residual connection mechanism. Hence, it can encode information multiple times and extract contextual features hierarchically. We conducted sufficient experiments on two representative datasets using multiple important metrics and compared them with other advanced baselines. We present promising results showing that our proposed En2BiLSTM-CRF has better performance as well as better generalization ability in both fine-grained and coarse-grained Chinese entity recognition tasks.

Download Full-text

CWPC_BiAtt: Character–Word–Position Combined BiLSTM-Attention for Chinese Named Entity Recognition

Information ◽

10.3390/info11010045 ◽

2020 ◽

Vol 11 (1) ◽

pp. 45 ◽

Cited By ~ 1

Author(s):

Shardrom Johnson ◽

Sherlock Shen ◽

Yuanchen Liu

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Attention Mechanism ◽

Entity Recognition ◽

Position Information ◽

Named Entity ◽

Pos Tagging ◽

Word Position

Usually taken as linguistic features by Part-Of-Speech (POS) tagging, Named Entity Recognition (NER) is a major task in Natural Language Processing (NLP). In this paper, we put forward a new comprehensive-embedding, considering three aspects, namely character-embedding, word-embedding, and pos-embedding stitched in the order we give, and thus get their dependencies, based on which we propose a new Character–Word–Position Combined BiLSTM-Attention (CWPC_BiAtt) for the Chinese NER task. Comprehensive-embedding via the Bidirectional Llong Short-Term Memory (BiLSTM) layer can get the connection between the historical and future information, and then employ the attention mechanism to capture the connection between the content of the sentence at the current position and that at any location. Finally, we utilize Conditional Random Field (CRF) to decode the entire tagging sequence. Experiments show that CWPC_BiAtt model we proposed is well qualified for the NER task on Microsoft Research Asia (MSRA) dataset and Weibo NER corpus. A high precision and recall were obtained, which verified the stability of the model. Position-embedding in comprehensive-embedding can compensate for attention-mechanism to provide position information for the disordered sequence, which shows that comprehensive-embedding has completeness. Looking at the entire model, our proposed CWPC_BiAtt has three distinct characteristics: completeness, simplicity, and stability. Our proposed CWPC_BiAtt model achieved the highest F-score, achieving the state-of-the-art performance in the MSRA dataset and Weibo NER corpus.

Download Full-text

Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding

Applied Sciences ◽

10.3390/app9183658 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3658 ◽

Cited By ~ 6

Author(s):

Jianliang Yang ◽

Yuenan Liu ◽

Minghui Qian ◽

Chenghua Guan ◽

Xiangfei Yuan

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Large Scale ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Language Models ◽

Named Entity

Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like Long Short-term Memory with Conditional Random Field (LSTM–CRF) achieved considerable performance in many datasets. In this paper, we developed a multitask attention-based bidirectional LSTM–CRF (Att-biLSTM–CRF) model with pretrained Embeddings from Language Models (ELMo) in order to achieve better performance. In the multitask system, an additional task named entity discovery was designed to enhance the model’s perception of unknown entities. Experiments were conducted on the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs (I2B2/VA) dataset. Experimental results show that our model outperforms the state-of-the-art solution both on the single model and ensemble model. Our work proposes an approach to improve the recall in the clinical named entity recognition task based on the multitask mechanism.

Download Full-text

Named entity recognition with long short-term memory

10.3115/1119176.1119202 ◽

2003 ◽

Cited By ~ 47

Author(s):

James Hammerton

Keyword(s):

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Download Full-text