Text retrieval based on the feature conversion of vector space

Author(s):  
Maoyuan Zhang ◽  
Jianping Zhu ◽  
Lijun Hua ◽  
Fang Yuan
2017 ◽  
Vol 8 (1) ◽  
pp. 355-362
Author(s):  
Putri Elfa Mas'udia ◽  
Martono Dwi Atmadja ◽  
Lis Diana Mustafa

Pencarian pada database yang biasa dilakukan mahasiswa hanya mampu mencari judul yang sesuai berdasarkan kata kunci yang diinputkan, misalnya, jika kata kunci yang dimasukkan adalah “sistem cerdas” maka akan ditampilkan semua dokumen yang mengandung kata “sistem cerdas” namun sistem tidak bisa mengukur mana dokumen yang paling mirip. Untuk dapat melakukan pencarian berdasar substansi  yang  paling  mirip,    terdapat  teknologi  yang  disebut  information  Text  Retrieval.  Dalam penelitian ini akan dikembangkan suatu sistem temu kembali informasi judul tugas akhir dan perhitungan kemiripan dokumen menggunakan vector space model. Sistem secara otomatis akan melakukan indexing secara offline dan temu kembali (retrieval) secara real time. Proses retrieval dimulai dengan mengambil query dari pengguna, menerapkan stop word removal sehingga dihasilkan keyword yang compaq tetapi dapatmewakili query tersebut, kemudian sistem menghitung kemiripan antarakeyword dengan daftar dokumen  yang  diwakili  oleh  term-term  di  dalam  index.  Dokumen  akan  ditampilkan  diurutkan berdasarkan dokumen yang paling mirip.Dari hasil pengujian terlihat ketika keyword “android” dimasukkan maka akan tampil empat dokumen yang diurutkan sesuai tingkat kemiripannya, yaitu docId 3 dengan tingkat kemiripan 0.9512, docId 4 dengan tingkat kemiripan 0.5020, docId 2 dengan tingkat kemiripan 0.2671, docId 8 dengan tingkat kemiripan 0.1522.


1998 ◽  
Vol 37 (03) ◽  
pp. 254-259 ◽  
Author(s):  
A. Zaiss ◽  
R. Brunner ◽  
D. Spinner ◽  
R. Klar ◽  
S. Schulz

AbstractThe increasing parallel use of ICD-9 and ICD-10 complicates the comparability of coded diagnoses. This is the reason why we developed a symmetric table for interactive conversion between ICD-9 and ICD-10, based on a vector space text-retrieval method that resulted in unambiguous mapping from ICD-9 to ICD-10 in 64%, from ICD-10 to ICD-9 in 87% of all three- and four-character classes of the tabular list. Out of the remaining 13% of multi-valued relations, a table for automated mapping from ICD-10 to ICD-9 was created. In 9% of cases, the selection offered no problems. A compromise between preserving information content and maintaining the logical integrity had to be found in 2.4%; in 1.6% automated mapping was impossible because of newly defined concepts and structural differences between ICD-9 and ICD-10 that are not counterbalanced by a consistent system of residual categories. We recommend that in a future revision of the ICD, compatibility with the then existing classification system should be considered.


2018 ◽  
Vol 14 (3) ◽  
pp. 184-192
Author(s):  
Radhi Ali ◽  
◽  
Jalal Hussein Bayati ◽  
Suhad Hameed

Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2015 ◽  
Author(s):  
Qing Dou ◽  
Ashish Vaswani ◽  
Kevin Knight ◽  
Chris Dyer

2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


Sign in / Sign up

Export Citation Format

Share Document