Geographic Feature Type Topic Model (GFTTM): grounding topics in the landscape

10.7287/peerj.preprints.816v1 ◽

2015 ◽

Author(s):

Benjamin Adams

Keyword(s):

Physical Environment ◽

Information Sources ◽

Topic Model ◽

Topic Models ◽

Observation Data ◽

Probabilistic Topic Models ◽

Geographic Feature ◽

Latent Topics ◽

Type Data ◽

Physical Features

Probabilistic topic models are a class of unsupervised machine learning models used for understanding the latent topics in a corpus of documents. A new method for combining geographic feature data with text from geo-referenced documents to create topic models that are grounded in the physical environment is proposed. The Geographic Feature Type Topic Model (GFTTM) models each document in a corpus as a mixture of feature type topics and abstract topics. Feature type topics are conditioned on additional observation data of the relative densities of geographic feature types co-located with the document's location referent, whereas abstract topics are trained independently of that information. The GFTTM is evaluated using geo-referenced Wikipedia articles and feature type data from volunteered geographic information sources. A technique for the measurement of semantic similarity of feature types and places based on the mixtures of topics associated with the types is also presented. The results of the evaluation demonstrate that GFTTM finds two distinct types of topics that can be used to disentangle how places are described in terms of its physical features and more abstract topics such as history and culture.

Download Full-text

Improving Topic Models with Latent Feature Word Representations

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00140 ◽

2015 ◽

Vol 3 ◽

pp. 299-313 ◽

Cited By ~ 94

Author(s):

Dat Quoc Nguyen ◽

Richard Billingsley ◽

Lan Du ◽

Mark Johnson

Keyword(s):

High Performance ◽

Feature Vector ◽

Topic Models ◽

Document Collections ◽

Probabilistic Topic Models ◽

New Models ◽

Classification Tasks ◽

Latent Topics ◽

Vector Representations ◽

Feature Word

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

Download Full-text

Predicting inpatient clinical order patterns with probabilistic topic models vs conventional order sets

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw136 ◽

2016 ◽

Vol 24 (3) ◽

pp. 472-480 ◽

Cited By ~ 13

Author(s):

Jonathan H Chen ◽

Mary K Goldstein ◽

Steven M Asch ◽

Lester Mackey ◽

Russ B Altman

Keyword(s):

Decision Support ◽

Topic Modeling ◽

Operating Characteristic ◽

Topic Model ◽

Characteristic Curve ◽

Topic Models ◽

Probabilistic Topic Models ◽

Probabilistic Topic Modeling ◽

Order Sets ◽

Operating Characteristic Curve

Objective: Build probabilistic topic model representations of hospital admissions processes and compare the ability of such models to predict clinical order patterns as compared to preconstructed order sets. Materials and Methods: The authors evaluated the first 24 hours of structured electronic health record data for > 10 K inpatients. Drawing an analogy between structured items (e.g., clinical orders) to words in a text document, the authors performed latent Dirichlet allocation probabilistic topic modeling. These topic models use initial clinical information to predict clinical orders for a separate validation set of > 4 K patients. The authors evaluated these topic model-based predictions vs existing human-authored order sets by area under the receiver operating characteristic curve, precision, and recall for subsequent clinical orders. Results: Existing order sets predict clinical orders used within 24 hours with area under the receiver operating characteristic curve 0.81, precision 16%, and recall 35%. This can be improved to 0.90, 24%, and 47% (P < 10−20) by using probabilistic topic models to summarize clinical data into up to 32 topics. Many of these latent topics yield natural clinical interpretations (e.g., “critical care,” “pneumonia,” “neurologic evaluation”). Discussion: Existing order sets tend to provide nonspecific, process-oriented aid, with usability limitations impairing more precise, patient-focused support. Algorithmic summarization has the potential to breach this usability barrier by automatically inferring patient context, but with potential tradeoffs in interpretability. Conclusion: Probabilistic topic modeling provides an automated approach to detect thematic trends in patient care and generate decision support content. A potential use case finds related clinical orders for decision support.

Download Full-text

Semi-supervised Max-margin Topic Model with Manifold Posterior Regularization

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/259 ◽

2017 ◽

Cited By ~ 2

Author(s):

Wenbo Hu ◽

Jun Zhu ◽

Hang Su ◽

Jingwei Zhuo ◽

Bo Zhang

Keyword(s):

Supervised Learning ◽

Topic Model ◽

Topic Models ◽

Stochastic Gradient ◽

Mcmc Method ◽

Tight Coupling ◽

Label Information ◽

Latent Topic ◽

Latent Topics ◽

Qualitative Performance

Supervised topic models leverage label information to learn discriminative latent topic representations. As collecting a fully labeled dataset is often time-consuming, semi-supervised learning is of high interest. In this paper, we present an effective semi-supervised max-margin topic model by naturally introducing manifold posterior regularization to a regularized Bayesian topic model, named LapMedLDA. The model jointly learns latent topics and a related classifier with only a small fraction of labeled documents. To perform the approximate inference, we derive an efficient stochastic gradient MCMC method. Unlike the previous semi-supervised topic models, our model adopts a tight coupling between the generative topic model and the discriminative classifier. Extensive experiments demonstrate that such tight coupling brings significant benefits in quantitative and qualitative performance.

Download Full-text

The Marine Environment

10.1093/oso/9780199233267.003.0001 ◽

2017 ◽

Author(s):

N. Penny Holliday ◽

Stephanie Henson

Keyword(s):

Primary Production ◽

North Atlantic ◽

Physical Environment ◽

Seasonal Cycle ◽

Physical Processes ◽

The North ◽

Basin Scale ◽

The North Atlantic ◽

Physical Features ◽

Growth Distribution

The growth, distribution, and variability of phytoplankton populations in the North Atlantic are primarily controlled by the physical environment. This chapter provides an overview of the regional circulation of the North Atlantic, and an introduction to the key physical features and processes that affect ecosystems, and especially plankton, via the availability of light and nutrients. There is a natural seasonal cycle in primary production driven by physical processes that determine the light and nutrient levels, but the pattern has strong regional variations. The variations are determined by persistent features on the basin scale (e.g. the main currents and mixed layer regimes of the subtropical and subpolar gyres), as well as transient mesoscale features such as eddies and meanders of fronts.

Download Full-text

Plant Phenotyping using Probabilistic Topic Models: Uncovering the Hyperspectral Language of Plants

Scientific Reports ◽

10.1038/srep22482 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 58

Author(s):

Mirwaes Wahabzada ◽

Anne-Katrin Mahlein ◽

Christian Bauckhage ◽

Ulrike Steiner ◽

Erich-Christian Oerke ◽

...

Keyword(s):

Topic Models ◽

Plant Phenotyping ◽

Probabilistic Topic Models

Download Full-text

Latent Topic Model for Indexing Arabic Documents

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014010102 ◽

2014 ◽

Vol 4 (1) ◽

pp. 29-45 ◽

Cited By ~ 3

Author(s):

Rami Ayadi ◽

Mohsen Maraoui ◽

Mounir Zrigui

Keyword(s):

Topic Model ◽

Inflectional Morphology ◽

Arabic Text ◽

Text Representation ◽

Text Documents ◽

Latent Topic ◽

Latent Topics ◽

F Measure

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then the authors compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.

Download Full-text

Author–Subject–Topic model for reviewer recommendation

Journal of Information Science ◽

10.1177/0165551518806116 ◽

2018 ◽

Vol 45 (4) ◽

pp. 554-570 ◽

Cited By ~ 1

Author(s):

Jian Jin ◽

Qian Geng ◽

Haikun Mou ◽

Chong Chen

Keyword(s):

Information System ◽

Topic Model ◽

Academic Library ◽

Topic Models ◽

Interdisciplinary Studies ◽

Distribution Analysis ◽

Topic Distribution ◽

Research Domains

Interdisciplinary studies are becoming increasingly popular, and research domains of many experts are becoming diverse. This phenomenon brings difficulty in recommending experts to review interdisciplinary submissions. In this study, an Author–Subject–Topic (AST) model is proposed with two versions. In the model, reviewers’ subject information is embedded to analyse topic distributions of submissions and reviewers’ publications. The major difference between the AST and Author–Topic models lies in the introduction of a ‘Subject’ layer, which supervises the generation of hierarchical topics and allows sharing of subjects among authors. To evaluate the performance of the AST model, papers in Information System and Management (a typical interdisciplinary domain) in a famous Chinese academic library are investigated. Comparative experiments are conducted, which show the effectiveness of the AST model in topic distribution analysis and reviewer recommendation for interdisciplinary studies.

Download Full-text