manual indexing
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

2020 ◽  
Vol 8 (2) ◽  
pp. 101-110
Author(s):  
Apoorva Ganapathy ◽  
Takudzwa Fadziso

An issue that the majority of the databases face is the static and manual character of indexing activities. This traditional method of indexing and sorting different topics is confirmed to shake the dataset performance somewhat, making downtime and a potential effect in the presentation that is normally addressed by manually indexing operations. Numerous data mining methods can accelerate this process by using proper indexing structures. Choosing the appropriate index generally relies upon the kind of operation that the algorithm performs against the dataset. Topic indexing is the operation of recognizing the principal topics covered by a document. These are helpful for some reasons: as subject headings in libraries, as keywords in scholarly articles, and as hashtags on social media platforms. Knowing a document’s topic assists individuals with deciding its importance quickly. In any case, assigning topics manually is a tedious and redundant task. This paper shows the best way to create them automatically in a way that contends with manual indexing done by humans. This paper also talks about the issues and the techniques for identifying applicable data in a huge variety of documents. The contribution of this thesis to this issue is to foster better content analysis techniques that can be utilized to describe document content with automated index terms. Index terms can be used as meta-data that defines documents and is utilized for seeking various topics. The main point of this paper is to show the way toward creating an automatic indexer which analyzes the topic of documents by integrating proof from word frequencies and proof from the linguistic analysis given by a syntactic parser. The indexer weighs the expressions of a document as per their assessed significance for depicting the topic of a given document based on the content analysis.


2020 ◽  
pp. 016555152097743
Author(s):  
Ahmad Aghaebrahimian ◽  
Andy Stauder ◽  
Michael Ustaszewski

The Wikipedia category system was designed to enable browsing and navigation of Wikipedia. It is also a useful resource for knowledge organisation and document indexing, especially using automatic approaches. However, it has received little attention as a resource for manual indexing. In this article, a hierarchical taxonomy of three-level depth is extracted from the Wikipedia category system. The resulting taxonomy is explored as a lightweight alternative to expert-created knowledge organisation systems (e.g. library classification systems) for the manual labelling of open-domain text corpora. Combining quantitative and qualitative data from a crowd-based text labelling study, the validity of the taxonomy is tested and the results quantified in terms of interrater agreement. While the usefulness of the Wikipedia category system for automatic document indexing is documented in the pertinent literature, our results suggest that at least the taxonomy we derived from it is not a valid instrument for manual subject matter labelling of open-domain text corpora.


Author(s):  
Najmus Saher Shah

<span>Indexing is one of the important tasks of Information Retrieval<span> that can be applied to any form of data, generated from the<span> web, databases, etc. As the size of corpora increases, indexing<span> becomes too time consuming and labor intensive, therefore,<span> the introduction of computer aided indexer. A review of indexing<span> techniques, both human and automatic indexing has been done<span> in this paper. This paper gives an outline of the use of automatic<span> indexing by discussing various hashing techniques including<span> fuzzy finger printing and locality-sensitive hashing. Two different<span> processes of matching that are used in automatic subject<span> indexing are also reviewed. Accepting the need of automatic<span> indexing in a possible replacement to manual indexing, studies<span> in the development of automatic indexing tools must continue<br /><br class="Apple-interchange-newline" /></span></span></span></span></span></span></span></span></span></span></span></span></span>


1996 ◽  
Vol 20 (2) ◽  
pp. 381-418
Author(s):  
Jean-Paul Metzger ◽  
Seyed Mohammad Mahmoudi

RÉSUMÉ L'objet de cet article réside dans la conception globale d'un analyseur morpho-syntaxique du persan pour 1'indexation automatique. L'analyseur se limite donc à la recherche des Syntagmes Nominaux (SN), considérés comme les éléments les plus informatifs, dans le contexte d'une recherche documentaire, pour l'analyse du contenu d'un texte. La mise au point d'un tel analyseur nécessite, au préalable, une segmentation et une catégorisation correcte de toute forme lexico-syntaxique. Nous présentons très brièvement un aperçu général du traitement automatique des langues naturelles (TAL) et certaines caractéristiques de la langue persane. Puis nous essayons de donner quelques solutions générales pour la construction des règies de réécriture nécessaires pour la reconnaissance automatique des SN en persan. Les règies de réécriture ainsi élaborées sont transcrites en un programme en langage Prolog. SUMMARY The aim of this paper is the conception and realisation of a morpho-syntactic parser of persian designed for applications to automatic indexing and computer-assisted instruction of the language (CAT). One of the chief extensions to this research is the automatic processing of natural language by means of artificial intelligence systems. The main interest of this contribution is to study the automatic recognition of noun phrases in Persian. In the case of automatic indexing, the recognition of the noun phrases would allow the apprehension of the content of the document. Automatic indexing, just as manual indexing, consists of selecting in every document the most informative elements which actually are descriptors or noun phrases (NP). The setting up or conception of such a parser demands, primarily, a correct segmentation and categorisation of any lexico-syntactic forms in the corpus. After having established all the transcription rules needed for the recognition of NP, we shall then transcribe every phase of the analysis by a program in Prolog language. All the lexical data necessary for the categorisation of morpho-syntactic forms are presented as clauses of Prolog in a data-base.


1977 ◽  
Vol 13 (1) ◽  
pp. 13-21 ◽  
Author(s):  
W.A. van der Meulen ◽  
P.J.F.C. Janssen
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document