A Heterogeneous System Based on Latent Semantic Analysis Using GPU and Multi-CPU

Scientific Programming ◽

10.1155/2017/8131390 ◽

2017 ◽

Vol 2017 ◽

pp. 1-19 ◽

Cited By ~ 4

Author(s):

Gabriel A. León-Paredes ◽

Liliana I. Barbosa-Santillán ◽

Juan J. Sánchez-Escobar

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Graphics Processing Units ◽

Semantic Analysis ◽

Computational Cost ◽

Heterogeneous Systems ◽

Semantic Space ◽

General Purpose ◽

Space Construction ◽

Value Decomposition

Latent Semantic Analysis (LSA) is a method that allows us to automatically index and retrieve information from a set of objects by reducing the term-by-document matrix using the Singular Value Decomposition (SVD) technique. However, LSA has a high computational cost for analyzing large amounts of information. The goals of this work are (i) to improve the execution time of semantic space construction, dimensionality reduction, and information retrieval stages of LSA based on heterogeneous systems and (ii) to evaluate the accuracy and recall of the information retrieval stage. We present a heterogeneous Latent Semantic Analysis (hLSA) system, which has been developed using General-Purpose computing on Graphics Processing Units (GPGPUs) architecture, which can solve large numeric problems faster through the thousands of concurrent threads on multiple CUDA cores of GPUs and multi-CPU architecture, which can solve large text problems faster through a multiprocessing environment. We execute the hLSA system with documents from the PubMed Central (PMC) database. The results of the experiments show that the acceleration reached by the hLSA system for large matrices with one hundred and fifty thousand million values is around eight times faster than the standard LSA version with an accuracy of 88% and a recall of 100%.

Download Full-text

Enabling the Latent Semantic Analysis of Large-Scale Information Retrieval Datasets by Means of Out-of-Core Heterogeneous Systems

Smart Technologies, Systems and Applications - Communications in Computer and Information Science ◽

10.1007/978-3-030-46785-2_9 ◽

2020 ◽

pp. 105-119

Author(s):

Gabriel A. León-Paredes ◽

Liliana I. Barbosa-Santillán ◽

Antonio Pareja-Lora

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Large Scale ◽

Semantic Analysis ◽

Heterogeneous Systems

Download Full-text

Topic-based mixture language modelling

Natural Language Engineering ◽

10.1017/s1351324900002278 ◽

1999 ◽

Vol 5 (4) ◽

pp. 355-375 ◽

Cited By ~ 3

Author(s):

YOSHIHIKO GOTOH ◽

STEVE RENALS

Keyword(s):

Information Retrieval ◽

Probabilistic Models ◽

Semantic Analysis ◽

Computational Cost ◽

Language Models ◽

Adaptive Procedure ◽

Text Data ◽

Automatic Clustering ◽

Language Modelling ◽

Value Decomposition

This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost.

Download Full-text

Kernel latent semantic analysis using an information retrieval based kernel

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09 ◽

10.1145/1645953.1646214 ◽

2009 ◽

Cited By ~ 1

Author(s):

Laurence A.F. Park ◽

Kotagiri Ramamohanarao

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Semantic Analysis

Download Full-text

Local and Global Latent Semantic Analysis for Text Categorization

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014070101 ◽

2014 ◽

Vol 4 (3) ◽

pp. 1-13

Author(s):

Khadoudja Ghanem

Keyword(s):

Information Retrieval ◽

High Precision ◽

Latent Semantic Analysis ◽

Classification Accuracy ◽

Text Categorization ◽

Semantic Analysis ◽

Experimental Results ◽

Semantic Approach ◽

Document Categorization ◽

Second Use

In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.

Download Full-text

lsemantica: A command for text similarity based on latent semantic analysis

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x19830910 ◽

2019 ◽

Vol 19 (1) ◽

pp. 129-142 ◽

Cited By ~ 1

Author(s):

Carlo Schwarz

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Learning Algorithm ◽

Singular Value ◽

Machine Learning Algorithm ◽

Truncated Singular Value Decomposition ◽

Similarity Comparison ◽

Text Similarity ◽

Semantic Relationships ◽

Value Decomposition

In this article, I present the lsemantica command, which implements latent semantic analysis in Stata. Latent semantic analysis is a machine learning algorithm for word and text similarity comparison and uses truncated singular value decomposition to derive the hidden semantic relationships between words and texts. lsemantica provides a simple command for latent semantic analysis as well as complementary commands for text similarity comparison.

Download Full-text

Split-and-Combine Singular Value Decomposition for Large-Scale Matrix

Journal of Applied Mathematics ◽

10.1155/2013/683053 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 6

Author(s):

Jengnan Tzeng

Keyword(s):

Singular Value Decomposition ◽

Large Scale ◽

Semantic Analysis ◽

Computational Cost ◽

Matrix Decomposition ◽

Singular Value ◽

Fast Method ◽

The Matrix ◽

Scale Matrix ◽

Value Decomposition

The singular value decomposition (SVD) is a fundamental matrix decomposition in linear algebra. It is widely applied in many modern techniques, for example, high- dimensional data visualization, dimension reduction, data mining, latent semantic analysis, and so forth. Although the SVD plays an essential role in these fields, its apparent weakness is the order three computational cost. This order three computational cost makes many modern applications infeasible, especially when the scale of the data is huge and growing. Therefore, it is imperative to develop a fast SVD method in modern era. If the rank of matrix is much smaller than the matrix size, there are already some fast SVD approaches. In this paper, we focus on this case but with the additional condition that the data is considerably huge to be stored as a matrix form. We will demonstrate that this fast SVD result is sufficiently accurate, and most importantly it can be derived immediately. Using this fast method, many infeasible modern techniques based on the SVD will become viable.

Download Full-text

Boosting novelty for biomedical information retrieval through probabilistic latent semantic analysis

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13 ◽

10.1145/2484028.2484174 ◽

2013 ◽

Cited By ~ 2

Author(s):

Xiangdong An ◽

Jimmy Xiangji Huang

Keyword(s):

Information Retrieval ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Biomedical Information Retrieval

Download Full-text

The Application of LSA in TCM Syndromes Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.1666 ◽

2013 ◽

Vol 284-287 ◽

pp. 1666-1670

Author(s):

Yan Fang Gan ◽

Zi Wei Ni ◽

Fan Lin

Keyword(s):

Health Care ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Clinical Manifestation ◽

Health Care Workers ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Semantic Space ◽

Primary Objective ◽

Care Workers

The description of syndromes and symptoms in traditional Chinese medicine (TCM) is extremely complicated. And how to diagnose the patient's syndrome in a better way is the primary objective of clinical health care workers all the time. It was a good attempt to diagnose patient's syndrome by combining Latent Semantic Analysis and the feature of TCM knowledge----both syndromes and organs have the same clinical manifestation collection that are symptoms. In this paper, correlative degrees would be computed and sorted in a certain latent semantic space which was constructed by syndromes and organs . According to the result of correlative degrees computing, the classifying could be done by choosing the highest correlative degree as the belonging class. The experimental results show that this method performs quite well.

Download Full-text

AMATCHMETHOD BASED ON LATENT SEMANTIC ANALYSIS FOR EARTHQUAKEHAZARD EMERGENCY PLAN

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w7-137-2017 ◽

2017 ◽

Vol XLII-2/W7 ◽

pp. 137-141

Author(s):

D. Sun ◽

S. Zhao ◽

Z. Zhang ◽

X. Shi

Keyword(s):

Decision Maker ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Semantic Space ◽

Word Segmentation ◽

Emergency Plan ◽

Match Method ◽

Part Of Speech ◽

Plan Retrieval ◽

Short Time

The structure of the emergency plan on earthquake is complex, and it’s difficult for decision maker to make a decision in a short time. To solve the problem, this paper presents a match method based on Latent Semantic Analysis (LSA). After the word segmentation preprocessing of emergency plan, we carry out keywords extraction according to the part-of-speech and the frequency of words. Then through LSA, we map the documents and query information to the semantic space, and calculate the correlation of documents and queries by the relation between vectors. The experiments results indicate that the LSA can improve the accuracy of emergency plan retrieval efficiently.

Download Full-text

A Family of Scheduling Algorithms for Hybrid Parallel Platforms

International Journal of Foundations of Computer Science ◽

10.1142/s012905411850003x ◽

2018 ◽

Vol 29 (01) ◽

pp. 63-90 ◽

Cited By ~ 3

Author(s):

Safia Kedad-Sidhoum ◽

Florence Monna ◽

Grégory Mounié ◽

Denis Trystram

Keyword(s):

Graphics Processing Units ◽

Dynamic Programming Algorithm ◽

Computational Cost ◽

General Purpose ◽

Parallel Applications ◽

Programming Algorithm ◽

Approximation Bound ◽

Computing Platforms ◽

Graphics Processing ◽

Independent Tasks

More and more parallel computing platforms are built upon hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like General Purpose Graphics Processing Units (GPGPUs). We present in this paper a new method for scheduling efficiently parallel applications with [Formula: see text] CPUs and [Formula: see text] GPGPUs, where each task of the application can be processed either on an usual core (CPU) or on a GPGPU. We consider the problem of scheduling [Formula: see text] independent tasks with the objective to minimize the time for completing the whole application (makespan). This problem is NP-hard, thus, we present two families of approximation algorithms that can achieve approximation ratios of [Formula: see text] or [Formula: see text] for any integer [Formula: see text] when only one GPGPU is considered, and [Formula: see text] or [Formula: see text] for [Formula: see text] GPGPUs, where [Formula: see text] is an arbitrary small value which corresponds to the target accuracy of a binary search. The proposed method is based on a dual approximation scheme that uses a dynamic programming algorithm. The associated computational costs are for the first (resp. second) family in [Formula: see text] (resp. [Formula: see text]) per step of dual approximation. The greater the value of parameter [Formula: see text], the better the approximation, but the more expensive the computational cost. Finally, we propose a relaxed version of the algorithm which achieves a running time in [Formula: see text] with a constant approximation bound of [Formula: see text]. This last result is compared to the state-of-the-art algorithm HEFT. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes.

Download Full-text