Universal Word Segmentation: Implementation and Interpretation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00033 ◽

2018 ◽

Vol 6 ◽

pp. 421-435 ◽

Cited By ~ 3

Author(s):

Yan Shao ◽

Christian Hardmeier ◽

Joakim Nivre

Keyword(s):

State Of The Art ◽

Word Segmentation ◽

Experimental Results ◽

Word Boundary ◽

Writing Systems ◽

Low Level ◽

Wide Range ◽

Segmentation Accuracy ◽

Small Set

Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work.

Download Full-text

Simple primitive recognition via hierarchical face clustering

Computational Visual Media ◽

10.1007/s41095-020-0192-6 ◽

2020 ◽

Vol 6 (4) ◽

pp. 431-443

Author(s):

Xiaolong Yang ◽

Xiaohong Jia

Keyword(s):

Hierarchical Clustering ◽

Test Data ◽

Efficient Algorithm ◽

Clustering Algorithm ◽

State Of The Art ◽

Experimental Results ◽

Face Clustering ◽

Wide Range ◽

Hierarchical Clustering Algorithm ◽

Bottom To Top

AbstractWe present a simple yet efficient algorithm for recognizing simple quadric primitives (plane, sphere, cylinder, cone) from triangular meshes. Our approach is an improved version of a previous hierarchical clustering algorithm, which performs pairwise clustering of triangle patches from bottom to top. The key contributions of our approach include a strategy for priority and fidelity consideration of the detected primitives, and a scheme for boundary smoothness between adjacent clusters. Experimental results demonstrate that the proposed method produces qualitatively and quantitatively better results than representative state-of-the-art methods on a wide range of test data.

Download Full-text

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/91 ◽

2021 ◽

Author(s):

Zhihao Fan ◽

Zhongyu Wei ◽

Siyuan Wang ◽

Ruize Wang ◽

Zejun Li ◽

...

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Experimental Results ◽

Text Representation ◽

Image Captioning ◽

Scene Graph ◽

Low Level ◽

Language And Vision ◽

High Level ◽

Cross Language

Existing research for image captioning usually represents an image using a scene graph with low-level facts (objects and relations) and fails to capture the high-level semantics. In this paper, we propose a Theme Concepts extended Image Captioning (TCIC) framework that incorporates theme concepts to represent high-level cross-modality semantics. In practice, we model theme concepts as memory vectors and propose Transformer with Theme Nodes (TTN) to incorporate those vectors for image captioning. Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN. On the vision side, TTN is configured to take both scene graph based features and theme concepts as input for visual representation learning. On the language side, TTN is configured to take both captions and theme concepts as input for text representation re-construction. Both settings aim to generate target captions with the same transformer-based decoder. During the training, we further align representations of theme concepts learned from images and corresponding captions to enforce the cross-modality learning. Experimental results on MS COCO show the effectiveness of our approach compared to some state-of-the-art models.

Download Full-text

Efficient Content-Based Sparse Attention with Routing Transformers

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00353 ◽

2021 ◽

Vol 9 ◽

pp. 53-68

Author(s):

Aurko Roy ◽

Mohammad Saffar ◽

Ashish Vaswani ◽

David Grangier

Keyword(s):

State Of The Art ◽

Language Modeling ◽

Sequence Length ◽

Image Generation ◽

Data Set ◽

Sliding Windows ◽

Sequence Modeling ◽

Wide Range ◽

Small Set ◽

Transformer Model

Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic sparse attention patterns that avoid allocating computation and memory to attend to content unrelated to the query of interest. This work builds upon two lines of research: It combines the modeling flexibility of prior work on content-based sparse attention with the efficiency gains from approaches based on local, temporal sparse attention. Our model, the Routing Transformer, endows self-attention with a sparse routing module based on online k-means while reducing the overall complexity of attention to O( n1.5d) from O( n2d) for sequence length n and hidden dimension d. We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15.8 vs 18.3 perplexity), as well as on image generation on ImageNet-64 (3.43 vs 3.44 bits/dim) while using fewer self-attention layers. Additionally, we set a new state-of-the-art on the newly released PG-19 data-set, obtaining a test perplexity of 33.2 with a 22 layer Routing Transformer model trained on sequences of length 8192. We open-source the code for Routing Transformer in Tensorflow.1

Download Full-text

Sanitizing hidden activations for improving adversarial robustness of convolutional neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210371 ◽

2021 ◽

pp. 1-11

Author(s):

Tianshi Mu ◽

Kequan Lin ◽

Huabing Zhang ◽

Jian Wang

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Black Box ◽

Experimental Results ◽

Amplification Effect ◽

Wide Range ◽

Adversarial Examples

Deep learning is gaining significant traction in a wide range of areas. Whereas, recent studies have demonstrated that deep learning exhibits the fatal weakness on adversarial examples. Due to the black-box nature and un-transparency problem of deep learning, it is difficult to explain the reason for the existence of adversarial examples and also hard to defend against them. This study focuses on improving the adversarial robustness of convolutional neural networks. We first explore how adversarial examples behave inside the network through visualization. We find that adversarial examples produce perturbations in hidden activations, which forms an amplification effect to fool the network. Motivated by this observation, we propose an approach, termed as sanitizing hidden activations, to help the network correctly recognize adversarial examples by eliminating or reducing the perturbations in hidden activations. To demonstrate the effectiveness of our approach, we conduct experiments on three widely used datasets: MNIST, CIFAR-10 and ImageNet, and also compare with state-of-the-art defense techniques. The experimental results show that our sanitizing approach is more generalized to defend against different kinds of attacks and can effectively improve the adversarial robustness of convolutional neural networks.

Download Full-text

Noisy Iris Segmentation with Reflections Removal Using Probable Boundary Edge Detector

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.236-237.1116 ◽

2012 ◽

Vol 236-237 ◽

pp. 1116-1121 ◽

Cited By ~ 1

Author(s):

Min Wang ◽

Ning Wang ◽

Xiao Gui Yao

Keyword(s):

Iris Recognition ◽

State Of The Art ◽

Image Database ◽

Recognition System ◽

Experimental Results ◽

Boundary Edge ◽

Iris Segmentation ◽

Edge Detector ◽

Segmentation Methods ◽

Segmentation Accuracy

Iris segmentation plays an important role in iris recognition system. Most of segmentation methods are affected by reflection spots, eyelash and eyelid etc. The goal of this work is to accurately segment the iris using Probable boundary (Pb) edge detector after horizontal-vertical weighted reflections removal. Experimental results on the challenging iris image database CASIA-Iris-Thousand with reflection spots sample demonstrate that the iris segmentation accuracy of the proposed methods outperforms state-of-the-art methods.

Download Full-text

Deep ChaosNet for Action Recognition in Videos

Complexity ◽

10.1155/2021/6634156 ◽

2021 ◽

Vol 2021 ◽

pp. 1-5

Author(s):

Huafeng Chen ◽

Maosheng Zhang ◽

Zhengming Gao ◽

Yunhong Zhao

Keyword(s):

Neural Network ◽

Action Recognition ◽

Deep Neural Network ◽

Recognition Accuracy ◽

State Of The Art ◽

Experimental Results ◽

Low Level ◽

Hidden Layer ◽

High Level ◽

Standard Action

Current methods of chaos-based action recognition in videos are limited to the artificial feature causing the low recognition accuracy. In this paper, we improve ChaosNet to the deep neural network and apply it to action recognition. First, we extend ChaosNet to deep ChaosNet for extracting action features. Then, we send the features to the low-level LSTM encoder and high-level LSTM encoder for obtaining low-level coding output and high-level coding results, respectively. The agent is a behavior recognizer for producing recognition results. The manager is a hidden layer, responsible for giving behavioral segmentation targets at the high level. Our experiments are executed on two standard action datasets: UCF101 and HMDB51. The experimental results show that the proposed algorithm outperforms the state of the art.

Download Full-text

Image Cationing with Visual-Semantic LSTM

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/110 ◽

2018 ◽

Cited By ~ 6

Author(s):

Nannan Li ◽

Zhenzhong Chen

Keyword(s):

Visual Processing ◽

State Of The Art ◽

Sampling Strategy ◽

Experimental Results ◽

Visual Cell ◽

Semantic Features ◽

Training Process ◽

Image Captioning ◽

Low Level ◽

High Level

In this paper, a novel image captioning approach is proposed to describe the content of images. Inspired by the visual processing of our cognitive system, we propose a visual-semantic LSTM model to locate the attention objects with their low-level features in the visual cell, and then successively extract high-level semantic features in the semantic cell. In addition, a state perturbation term is introduced to the word sampling strategy in the REINFORCE based method to explore proper vocabularies in the training process. Experimental results on MS COCO and Flickr30K validate the effectiveness of our approach when compared to the state-of-the-art methods.

Download Full-text

Surveying word boundary factor in Chinese - Vietnamese statistical machine translation

Science and Technology Development Journal ◽

10.32508/stdj.v18i2.1133 ◽

2015 ◽

Vol 18 (2) ◽

pp. 70-78

Author(s):

Phuoc Thanh Tran ◽

Dien Dinh

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Segmentation ◽

Experimental Results ◽

Word Boundary ◽

Experimental Result ◽

Future Research

In isolating languages such as Chinese and Vietnamese, words are not separated by spaces, a word can include one or more spelling words. Segmenting word or not before training and translating process is a problem that need to be considered. In this paper, we will survey the effect of word boundary factor in the translation result of Chinese-Vietnamese statistical machine translation (SMT). The experimental result of this paper will be the basis for word segmentation improvement in future research which increase machine translation performance. We surveyed on two experiments: word segmentation (WS) and word un-segmentation (WUS) on the corpus of 8,000 and 12,000 sentence pairs. Based on the experimental results, we found that both of WS corpus and WUS corpus have their own advantages and defects. We propose integrating the advantages of these two methods in SMT

Download Full-text

Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation

Sensors ◽

10.3390/s20041010 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1010 ◽

Cited By ~ 13

Author(s):

Yiqing Zhang ◽

Jun Chu ◽

Lu Leng ◽

Jun Miao

Keyword(s):

Receptive Field ◽

Spatial Information ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Experimental Results ◽

Small Scale ◽

Feature Maps ◽

Segmentation Accuracy ◽

Instance Segmentation

With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability.

Download Full-text

Method for Character Segmentation of Shopping Receipts of POS Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.2274 ◽

2014 ◽

Vol 543-547 ◽

pp. 2274-2277

Author(s):

Wu Zheng ◽

Xiang Nian Huang ◽

Yan Ying Ma

Keyword(s):

Chinese Character ◽

Secondary Analysis ◽

Word Segmentation ◽

Experimental Results ◽

Character Segmentation ◽

Chinese Characters ◽

Analysis Method ◽

Segmentation Accuracy ◽

Left And Right ◽

Line Segmentation

This paper is focused on multiclass character segmentation for image of shopping receipt from POS machine, especially the Chinese characters. A secondary analysis method based on projection histogram is proposed, where the histogram is regarded as waveform signal, through filtering processing, it is to find trough point for the line segmentation. As for the word segmentation, this paper proposes rules for mergence of Chinese character components to solve issue about segmentation of Chinese characters with left and right components. The experimental results demonstrated that the segmentation accuracy is above 96%.

Download Full-text