Hoshi: A Japanese morphological adorner for TEI XML

Digital Scholarship in the Humanities ◽

10.1093/llc/fqaa003 ◽

2020 ◽

Author(s):

Jerry Bonnell ◽

Mitsunori Ogihara

Keyword(s):

Morphological Analysis ◽

Textual Analysis ◽

Target Language ◽

Text Encoding ◽

Text Encoding Initiative ◽

Alignment Process ◽

Principal Tool

Abstract Morphological adornment of text in Text Encoding Initiative (TEI) XML can be useful for studies in textual analysis. MorphAdorner is a principal tool for providing such functionality in English texts. However, its practical use is limited when the input XML contains branching text, e.g. when <choice> appears, as it modifies the input document. In such cases, preprocessing is required to obtain the desired results. This article introduces a new tool Hoshi with the purpose of determining how this issue can be best handled with minimal input modification and preprocessing needed. It also investigates whether parsing software available online can be used to supply morphological information that can be encoded in an output format like MorphAdorner, and whether such a tool can be developed to adorn text in other languages. Challenges include those posed by the target language, the current software available for providing morphological analysis in it, and the schema needed for encoding the results. Moreover, technical hurdles presented by segmented and branching text can complicate the alignment process, especially when the intent is to guarantee input document integrity. Our approach for handling these is presented, and the article ends by outlining future applications of Hoshi that can help to enhance TEI scholarship that prioritizes the use of morphological word metadata.

Download Full-text

Proposing a TEI-Encoding Project for the Wesley Works

ATLA Summary of Proceedings ◽

10.31046/proceedings.2020.1850 ◽

2020 ◽

pp. 232-238

Author(s):

Michelle Taylor ◽

Andrew Keck

Keyword(s):

Use Value ◽

John Wesley ◽

Text Encoding ◽

Text Encoding Initiative ◽

The Creation

The Text Encoding Initiative (TEI), a branch of XML, is a mature standard for encoding texts that was developed three decades ago and continues to be improved and expanded upon today. Learn about how TEI was centrally imagined for a project devoted to a corpus of John Wesley material. We will begin by explaining why we chose to use TEI for the project and reviewing the considerations inherent in transitioning from a longstanding print-based project to a digital project, including the challenges of converting thousands of pages of text across different file types into rudimentary TEI. Next, we will move into topics specific to TEI encoding practices, including the creation of XML tagsets designed to maximize the use value of the Wesley Works for its various audiences: scholars, librarians, and clergy. Finally, we will show the TEI in action by sharing an example of an XML file from our first round of encoding.

Download Full-text

Morphological Analysis of the Neologisms during the COVID-19 Pandemic

International Journal of English Language Studies ◽

10.32996/ijels.2020.2.3.11 ◽

2020 ◽

Vol 2 (3) ◽

pp. 01-07

Author(s):

Katherine B. Akut

Keyword(s):

Morphological Analysis ◽

Textual Analysis ◽

Compound Structure ◽

Interesting Phenomenon ◽

Qualitative Design ◽

Morphological Process ◽

New Words ◽

Corona Virus ◽

The Common ◽

Morphological Processes

The emergence of neologisms has always been an interesting phenomenon as it demonstrates the dynamism of language. This study intends to determine the neologisms during COVID-19 Pandemic through a morphological analysis. This study argues that the neologisms that emerge during the COVID-19 pandemic reveal the morphological processes that formed the new words. It further claims that the morphemic structures of the neologisms follow the general structures of English vocabulary. This study utilizes the descriptive-qualitative design in analyzing the morphological structures of the neologisms during the COVID-19 pandemic. Specifically, this study involves textual analysis to determine the morphological processes that encompass the formation of new words. The data used in analyzing the morphological structures of COVID-19-related neologisms are five (5) Internet articles that introduce the new terms created because of the corona virus outbreak. These articles were published in the months of March, April and May 2020.Findings reveal that most of the neologisms are nouns. The common morphological process involved in the formation of new words are compounding, blending and affixation. Moreover, majority of the neologisms follow the compound structure of the free and bound morphemes. Based on the results of the study, it can be concluded that the neologisms formed during the COVID-19 pandemic reveal the morphological processes and the morphemic structures of the neologisms follow the general structures of English vocabulary specifically on the combination of free and bound morphemes.

Download Full-text

Manus OnLine and the Text Encoding Initiative Schema

Journal of the Text Encoding Initiative ◽

10.4000/jtei.1054 ◽

2014 ◽

Cited By ~ 1

Author(s):

Giliola Barbero ◽

Francesca Trasselli

Keyword(s):

Text Encoding ◽

Text Encoding Initiative

Download Full-text

XML Annotation of Hebrew Elements in Judeo-Arabic Texts

Journal of Jewish Languages ◽

10.1163/22134638-06021122 ◽

2018 ◽

Vol 6 (2) ◽

pp. 221-242 ◽

Cited By ~ 1

Author(s):

Mohamed A. H. Ahmed

Keyword(s):

Code Switching ◽

Extensible Markup Language ◽

Markup Language ◽

Text Encoding ◽

Text Encoding Initiative ◽

Extensible Markup

Abstract The main aim of this study is to introduce a model of TEI (Text Encoding Initiative) annotation of Hebrew elements in Judeo-Arabic texts, i.e., code switching (CS), borrowing, and Hebrew quotations. This article will provide an introduction to using XML (Extensible Markup Language) to investigate sociolinguistic aspects in medieval Judeo-Arabic texts. Accordingly, it will suggest to what extent using XML is useful for investigating linguistic and sociolinguistic features in the Judeo-Arabic paradigm. To provide an example for how XML annotation could be applied to Judeo-Arabic texts, a corpus of 300 pages selected from three Judeo-Arabic books has been manually annotated using the TEI P5. The annotation covers all instances of CS, borrowing, and Hebrew quotations in that corpus.

Download Full-text

Kaleidoscopic Pedagogy in the Classroom Laboratory

Teaching with Digital Humanities ◽

10.5622/illinois/9780252042232.003.0001 ◽

2018 ◽

pp. 3-23

Author(s):

Ryan Cordell ◽

Benjamin J. Doyle ◽

Elizabeth Hopwood

Keyword(s):

Nineteenth Century ◽

Data Analysis ◽

Pedagogical Practices ◽

Close Reading ◽

Learning Spaces ◽

Text Encoding ◽

Text Encoding Initiative ◽

Classroom Laboratory ◽

Nuanced Understanding ◽

Interpretive Process

Ryan Cordell, Benjamin Doyle, and Elizabeth Hopwood’s essay seizes a nineteenth-century invention, the kaleidoscope, as a model and metaphor for pedagogical practices and learning spaces that encourage play and experimentation. Through examples that involve setting letterpress type, the Text Encoding Initiative (TEI) encoding of nineteenth-century texts as an interpretive process, and the collaborative creation of Wikipedia pages, the authors describe how experiments with contemporary technologies help students claim scholarly agency over the texts and tools central to their study of the nineteenth century. Kaleidoscopic pedagogy encourages students to discover how C19 competencies like close reading and contemporary methods of coding and data analysis have the potential to be mutually constitutive, inspiring a more nuanced understanding of both periods.

Download Full-text

Understanding Documentary Practice: Lessons Learnt from the Text Encoding Initiative

Research and Advanced Technology for Digital Libraries - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24469-8_29 ◽

2011 ◽

pp. 272-283

Author(s):

Paul Scifleet ◽

Susan P. Williams

Keyword(s):

Text Encoding ◽

Text Encoding Initiative ◽

Lessons Learnt

Download Full-text

Annotating an oral corpus using the Text Encoding Initiative. Methodology, problems, solutions

Journal of French Language Studies ◽

10.1017/s0959269507003183 ◽

2008 ◽

Vol 18 (1) ◽

pp. 103-119

Author(s):

JANICE CARRUTHERS

Keyword(s):

Final Section ◽

Text Encoding ◽

Text Encoding Initiative ◽

Discourse Type ◽

And Inversion

ABSTRACTThe objective of this paper is to describe and evaluate the application of the Text Encoding Initiative (TEI) Guidelines to a corpus of oral French, this being the first corpus of oral French where the TEI has been used. The paper explains the purpose of the corpus, both in creating a specialist corpus ofnéo-contagethat will broaden the range of oral corpora available, and, more importantly, in creating a dataset to explore a variety of oral French that has a particularly interesting status in terms of factors such asconception orale/écrite, réalisation médialeandcomportement communicatif(Koch and Oesterreicher 2001). The linguistic phenomena to be encoded are both stylistic (speech and thought presentation) and syntactic (negation, detachment, inversion), and all represent areas where previous research has highlighted the significance of factors such as medium, register and discourse type, as well as a host of linguistic factors (syntactic, phonetic, lexical). After a discussion of how a tagset can be designed and applied within the TEI to encode speech and thought presentation, negation, detachment and inversion, the final section of the paper evaluates the benefits and possible drawbacks of the methodology offered by the TEI when applied to a syntactic and stylistic markup of an oral corpus.

Download Full-text

One Document Does-it-all (ODD): a language for documentation, schema generation, and customization from the Text Encoding Initiative

Proceedings of the Symposium on Markup Vocabulary Customization ◽

10.4242/balisagevol24.viglianti01 ◽

2019 ◽

Cited By ~ 2

Author(s):

Raffaele Viglianti

Keyword(s):

Social Sciences ◽

Community Of Practice ◽

Annual Conference ◽

Technical Standard ◽

Online Journal ◽

Text Encoding ◽

Academic Field ◽

Text Encoding Initiative ◽

Mailing Lists ◽

Machine Readable

TEI, the Text Encoding Initiative, was founded in 1987 to develop guidelines for encoding machine-readable texts of interest to the humanities and social sciences. The TEI is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs several mailing lists, holds an annual conference, and maintains an eponymous technical standard, an online journal, a wiki, a GitHub repository, and a toolchain. The TEI Guidelines, which collectively define an XML format, are the defining output of the community of practice. The format differs from other well-known open formats for text (such as HTML and OpenDocument) in that it’s main mission is for encoding “extant” texts such that they are amenable to scholarly processing. After a brief introduction to the TEI, we will discuss the mechanisms built in to the TEI for customization.

Download Full-text