Correction to “Inference of Regular Expressions for Text Extraction from Examples”

2016 ◽  
Vol 28 (7) ◽  
pp. 1944-1944
Author(s):  
Alberto Bartoli ◽  
Andrea De Lorenzo ◽  
Eric Medvet ◽  
Fabiano Tarlao
2016 ◽  
Vol 28 (5) ◽  
pp. 1217-1230 ◽  
Author(s):  
Alberto Bartoli ◽  
Andrea De Lorenzo ◽  
Eric Medvet ◽  
Fabiano Tarlao

Corpora ◽  
2020 ◽  
Vol 15 (2) ◽  
pp. 125-140
Author(s):  
Yukiko Ohashi ◽  
Noriaki Katagiri ◽  
Katsutoshi Oka ◽  
Michiko Hanada

This paper reports on two research results: ( 1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and ( 2) a case study to test the design to cater for creating a specific vocabulary list using the compiled corpus. The first half of this study involved designing a precisely structured esp corpus from 190 veterinary medical charts with a hierarchy of the data. The data hierarchy in the corpus consists of document types, outline elements and inline elements, such as species and breed. Perl scripts extracted the data attached to veterinary-specific categories, and the extraction led to creating wordlists. The second part of the research tested the corpus mode, creating a list of commonly observed lexical items in veterinary medicine. The coverage rate of the wordlists by General Service List (gsl) and Academic Word List (awl) was tested, with the result that 66.4 percent of all lexical items appeared in gsl and awl, whereas 33.7 percent appeared in none of those lists. The corpus compilation procedures as well as the annotation scheme introduced in this study enable the compilation of specific corpora with explicit annotations, allowing teachers to have access to data required for creating esp classroom materials.


2009 ◽  
Vol 29 (1) ◽  
pp. 57-59
Author(s):  
Wei DAI ◽  
Shen-sheng ZHANG

2009 ◽  
Vol 43 (1) ◽  
pp. 203-205 ◽  
Author(s):  
Chetan Kumar ◽  
K. Sekar

The identification of sequence (amino acids or nucleotides) motifs in a particular order in biological sequences has proved to be of interest. This paper describes a computing server,SSMBS, which can locate and display the occurrences of user-defined biologically important sequence motifs (a maximum of five) present in a specific order in protein and nucleotide sequences. While the server can efficiently locate motifs specified using regular expressions, it can also find occurrences of long and complex motifs. The computation is carried out by an algorithm developed using the concepts of quantifiers in regular expressions. The web server is available to users around the clock at http://dicsoft1.physics.iisc.ernet.in/ssmbs/.


Sign in / Sign up

Export Citation Format

Share Document