scholarly journals Pattern Matching and Discourse Processing in Information Extraction from Japanese Text

1994 ◽  
Vol 2 ◽  
pp. 89-110 ◽  
Author(s):  
T. Kitani ◽  
Y. Eriguchi ◽  
M. Hara

Information extraction is the task of automaticallypicking up information of interest from an unconstrained text. Informationof interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scatteredthroughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple patternsearch can achieve this purpose. The second step requires deeperknowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto a structuredoutput format, complex discourse processing is essential. This paperreports on a Japanese information extraction system that merges information using a pattern matcher and discourse processor. Evaluationresults show a high level of system performance which approaches human performance.

Author(s):  
Nina Maksimchuk

The attention of modern linguistics to the study of verbal representatives of the mental essence (both individual and collective one) of the native speakers involves an appeal to all subsystems of the national language where territorial dialects take a significant part. The analysis of dialect linguistic units possessing linguistic and cultural value is considered as a necessary way for the study of people’s worldview and perception of the world, national mentality as a whole. The ability of stable phrases (phraseological units) to preserve and express a native speaker’s attitude to the world around them is the basis for the use of the analysis of folk phraseology as a way of penetration into a speaker’s spiritual world. Volumetric representation of the external and internal peculiarities of stable phrases allows the author to get their systematization in the form of phraseosemantic field consisting of different kinds singled out in phraseosemantic groups. The article deals with stable phrases of synonymic value recorded in the Dictionary of Smolensk dialects and stable phrases forming a phraseosemantic group. These phrases are analyzed taking into account the semantic structure of the key word, the characteristics of the dependent word, and the method of forming phraseological semantics. On the example of the analysis of phrases with the key word «bit’» and a synonymic series with the semantic dominant «bezdel’nichat’», the article discusses the peculiarities of phraseological nomination in Smolensk dialects and confirms a high level of connotativity and evaluation in the folk phraseology.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


1978 ◽  
Vol 13 (5) ◽  
pp. 46-55 ◽  
Author(s):  
D. R. Ditzel
Keyword(s):  

2018 ◽  
Vol 25 (2) ◽  
pp. 287-306 ◽  
Author(s):  
Cleiton Fernando Lima Sena ◽  
Daniela Barreiro Claro

AbstractNowadays, there is an increasing amount of digital data. In the case of the Web, daily, a vast collection of data is generated, whose contents are heterogeneous. A significant portion of this data is available in a natural language format. Open Information Extraction (Open IE) enables the extraction of facts from large quantities of texts written in natural language. In this work, we propose an Open IE method to extract facts from texts written in Portuguese. We developed two new rules that generalize the inference by transitivity and by symmetry. Consequently, this approach increases the number of implicit facts in a sentence. Our novel symmetric inference approach is based on a list of symmetric features. Our results confirmed that our method outstands close works both in precision and number of valid extractions. Considering the number of minimal facts, our approach is equivalent to the most relevant methods in the literature.


Author(s):  
Shuang Peng ◽  
Mengdi Zhou ◽  
Minghui Yang ◽  
Haitao Mi ◽  
Shaosheng Cao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document