The linguistic annotation system of the Stockholm

The Linguistic Annotation of Corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.3.2.02aar ◽

1998 ◽

Vol 3 (2) ◽

pp. 189-210 ◽

Cited By ~ 1

Author(s):

Jan Aarts ◽

Hans van Halteren ◽

Nelleke Oostdijk

Keyword(s):

Language Processing ◽

Corpus Linguistics ◽

System Performance ◽

Annotation System ◽

Corpus Linguistic ◽

Linguistic Annotation ◽

Corpus Data ◽

Analysis System ◽

Performance Results

The article discusses the role of linguistic annotation in corpus linguistics as opposed to annotation in natural language processing. In corpus linguistics, annotation is an integral part of the process of linguistic interpretation and description of the data. Tagging and parsing are discussed as the automatic counterparts of, respectively, the paradigmatic and the syntagmatic description of corpus data. The requirements for a corpus linguistic annotation system are considered. An account is given of the TOSCA analysis system as representative of such an annotation system. Performance results of the system are given, and an evaluation is made.

Download Full-text

71. Linguistic Annotation System for Gestures

Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science (HSK) 38/1 ◽

10.1515/9783110261318.1098 ◽

2013 ◽

Cited By ~ 1

Author(s):

Jana Bressem ◽

Silva H. Ladewig ◽

Cornelia Müller

Keyword(s):

Annotation System ◽

Linguistic Annotation

Download Full-text

Design and development of Iberia: a corpus of scientific Spanish

Corpora ◽

10.3366/cor.2011.0010 ◽

2011 ◽

Vol 6 (2) ◽

pp. 145-158 ◽

Cited By ~ 2

Author(s):

Jordi Porta Zamorano ◽

Emilio del Rosal García ◽

Ignacio Ahumada Lara

Keyword(s):

User Interface ◽

Design And Development ◽

Linguistic Annotation

Iberia is a synchronic corpus of scientific Spanish designed mainly for terminological studies. In this paper, we describe its design and the infrastructure for its acquisition, processing and exploitation, including mark-up, linguistic annotation, indexing and the user interface. Two pre-processing tasks affecting a large number of words are described in detail: de-hyphenation and identification of text fragments in other languages. We also show how some of the reported statistics, namely, dispersion and association, are used for research on lexis.

Download Full-text

Updating the ICE annotation system: tagging, parsing and validation

Corpora ◽

10.3366/corp.2011.0009 ◽

2011 ◽

Vol 6 (2) ◽

Keyword(s):

Annotation System

Download Full-text

Proceedings of The 9th Linguistic Annotation Workshop

10.3115/v1/w15-16 ◽

2015 ◽

Keyword(s):

Linguistic Annotation

Download Full-text

INITIAL STEP OF SPECIALIZED CORPORA BUILDING: CLEANING PROCEDURES

NORDSCI Conference proceedings, Book 1 Volume 3 ◽

10.32008/nordsci2020/b1/v3/16 ◽

2020 ◽

Author(s):

Vera Yakubson ◽

Victor Zakharov

Keyword(s):

Text Processing ◽

Research Question ◽

Initial Step ◽

Main Research ◽

Academic Texts ◽

Linguistic Annotation ◽

Significant Difference ◽

Cleaning Procedures ◽

Unique Source ◽

Future Work

This paper deals with the specialized corpora building, specifically academic language corpus in the biotechnology field. Being a part of larger research devoted to creation and usage of specialized parallel corpus, this piece aims to analyze the initial step of corpus building. Our main research question was what procedures we need to implement to the texts before using them to develop the corpus. Analysis of previous research showed the significant quantity of papers devoted to corpora creation, including academic specialized corpora. Different sides of the process were analyzed in these researches, including the types of texts used, the principles of crawling, the recommended length of texts etc. As to the text processing for the needs of corpora creation, only the linguistic annotation issues were examined earlier. At the same time, the preliminary cleaning of texts before their usage in corpora may have significant influence on the corpus quality and its utility for the linguistic research. In this paper, we considered three small corpora derived from the same set of academic texts in the biotechnology field: “raw” corpus without any preliminary cleaning and two corpora with different level of cleaning. Using different Sketch Engine tools, we analyzed these corpora from the position of their future users, predominantly as sources for academic wordlists and specialized multi-word units. The conducted research showed very little difference between two cleaned corpora, meaning that only basic cleaning procedures such as removal of reference lists are can be useful in corpora design. At the same time, we found a significant difference between raw and cleaned corpora and argue that this difference can affect the quality of wordlists and multi-word terms extraction, therefore these cleaning procedures are meaningful. The main limitation of the study is that all texts were taken from the unique source, so the conclusions could be affected by this specific journal’s peculiarities. Therefore, the future work should be the verification of results on different text collections

Download Full-text

SALKG: A Semantic Annotation System for Building a High-quality Legal Knowledge Graph

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378107 ◽

2020 ◽

Author(s):

Mingwei Tang ◽

Cui Su ◽

Haihua Chen ◽

Jingye Qu ◽

Junhua Ding

Keyword(s):

Semantic Annotation ◽

Knowledge Graph ◽

Legal Knowledge ◽

High Quality ◽

Annotation System

Download Full-text

Corpus Annotation System Based on HanLP Chinese Word Segmentation

The 2nd International Conference on Computing and Data Science ◽

10.1145/3448734.3450845 ◽

2021 ◽

Author(s):

Xuanjun Liu ◽

Zheyu Zhu ◽

Tengyan Fu ◽

Jiaxuan Chen ◽

Ying Jiang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Corpus Annotation ◽

Chinese Word Segmentation ◽

Annotation System

Download Full-text

A Collaborative Video Annotation System Based on Semantic Web Technologies

Cognitive Computation ◽

10.1007/s12559-012-9172-1 ◽

2012 ◽

Cited By ~ 9

Author(s):

Marco Grassi ◽

Christian Morbidoni ◽

Michele Nucci

Keyword(s):

Semantic Web ◽

Video Annotation ◽

Semantic Web Technologies ◽

Web Technologies ◽

Annotation System ◽

Collaborative Video

Download Full-text

Enhancement of digital reading performance by using a novel web-based collaborative reading annotation system with two quality annotation filtering mechanisms

International Journal of Human-Computer Studies ◽

10.1016/j.ijhcs.2015.09.006 ◽

2016 ◽

Vol 86 ◽

pp. 81-93 ◽

Cited By ~ 12

Author(s):

Jiun-Chi Jan ◽

Chih-Ming Chen ◽

Po-Han Huang

Keyword(s):

Reading Performance ◽

Digital Reading ◽

Web Based ◽

Annotation System ◽

Collaborative Reading

Download Full-text