random indexing
Recently Published Documents


TOTAL DOCUMENTS

53
(FIVE YEARS 3)

H-INDEX

9
(FIVE YEARS 0)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Aniruddha Agarwal ◽  
Gagan Kalra ◽  
Rupesh Agrawal ◽  
Reema Bansal ◽  
Vishali Gupta

AbstractTo analyze the longitudinal changes in the outer plexiform layer (OPL) in patients with tubercular serpiginous-like choroiditis (TB SLC) and compare it to the healthy control population. Clinical and imaging data of subjects with TB SLC (minimum 6-month follow-up) and healthy control subjects were reviewed. Optical coherence tomography (OCT) imaging obtained using swept-source device (DRI Triton, Topcon, Japan) from three visits (baseline, 3 months, and 6 months) were analyzed. Three OCT scans were chosen—one passing through the center of the fovea, one line above, and one line below. After random indexing to anonymize the images, they were pre-processed and fed into an automated pipeline to identify, crop, and measure the area of the OPL in the line scan. Longitudinal comparisons of OPL within the patient group were performed. The study included 32 eyes (16 patients; 11 males; mean age: 32.9 ± 7.8 years) with TB SLC. Twenty-eight eyes (14 subjects; 10 males: mean age: 31.1 ± 6.2 years) of healthy control subjects (age- and gender-matched) were also selected. The area of OPL was significantly different between the baseline and month 6 visit (6288 ± 1803 versus 5487 ± 1461; p = 0.0002) at the central scan passing through the fovea. For the scans above and below the fovea, the reduction in OPL area was significant at each visit (p < 0.0001). In comparison with healthy control subjects, OPL area values in patients with TB SLC were significantly lower at the month-3 (6116 ± 1441 versus 7136 ± 2539; p = 0.04) and the 6-month visit (5487 ± 1461 versus 7136 ± 2539; p < 0.001). The atrophied OPL at month 6 has been referred to as the “middle limiting membrane” (MLM). Subjects with TB SLC may develop progressive atrophy of the OPL resulting in formation of MLM, which is seen as a hyper-reflective line replacing the OPL. The analysis of longitudinal changes in the OPL may be useful in predicting anatomical and functional outcomes in these patients.


2021 ◽  
pp. 1-24
Author(s):  
Paul Donner

Abstract Cumulative dissertations are doctoral theses comprised of multiple published articles. For studies of publication activity and citation impact of early career researchers it is important to identify these articles and link them to their associated theses. Using a new benchmark dataset, this paper reports on experiments of measuring the bilingual textual similarity between, on the one hand, titles and keywords of doctoral theses, and, on the other hand, articles’ titles and abstracts. The tested methods are cosine similarity and L1 distance in the Vector Space Model (VSM) as baselines, the language-indifferent methods Latent Semantic Analysis (LSA) and trigram similarity, and the language-aware methods fastText and Random Indexing (RI). LSA and RI, two supervised methods, were trained on a purposively collected bilingual scientific parallel text corpus. The results show that the VSM baselines and the RI method perform best but that the VSM method is unsuitable for cross-language similarity due to its inherent monolingual bias.


Author(s):  
Go Eun Heo ◽  
Qing Xie ◽  
Min Song ◽  
Jeong-Hoon Lee

Abstract Background Extracting useful information from biomedical literature plays an important role in the development of modern medicine. In natural language processing, there have been rigorous attempts to find meaningful relationships between entities automatically by co-occurrence-based methods. It has been increasingly important to understand whether relationships exist, and if so how strong, between any two entities extracted from a large number of texts. One of the defining methods is to measure semantic similarity and relatedness between two entities. Methods We propose a hybrid ranking method that combines a co-occurrence approach considering both direct and indirect entity pair relationship with specialized word embeddings for measuring the relatedness of two entities. Results We evaluate the proposed ranking method comparatively with other well-known methods such as co-occurrence, Word2Vec, COALS (Correlated Occurrence Analog to Lexical Semantics), and random indexing by calculating top-ranked entities related to Alzheimer’s disease. In addition, we analyze gene, pathway, and gene–phenotype relationships. Overall, the proposed method tends to find more hidden relationships than the other methods. Conclusion Our proposed method is able to select more useful related entities that not only highly co-occur but also have more indirect relations for the target entity. In pathway analysis, our proposed method shows superior performance at identifying (functional) cross clustering and higher-level pathways. Our proposed method, resulting from phenotype analysis, has an advantage in identifying the common genotype relating to phenotypes from biological literature.


2019 ◽  
Vol 20 (2) ◽  
pp. 117-130 ◽  
Author(s):  
Julian Benadit Pernabas ◽  
Sagayraj Francis Fidele ◽  
Krishna Kumar Vaithinathan

2019 ◽  
Vol 25 (4) ◽  
pp. 503-517 ◽  
Author(s):  
Jussi Karlgren ◽  
Pentti Kanerva

AbstractHigh-dimensional distributed semantic spaces have proven useful and effective for aggregating and processing visual, auditory and lexical information for many tasks related to human-generated data. Human language makes use of a large and varying number of features, lexical and constructional items as well as contextual and discourse-specific data of various types, which all interact to represent various aspects of communicative information. Some of these features are mostly local and useful for the organisation of, for example, argument structure of a predication; others are persistent over the course of a discourse and necessary for achieving a reasonable level of understanding of the content.This paper describes a model for high-dimensional representation for utterance and text-level data including features such as constructions or contextual data, based on a mathematically principled and behaviourally plausible approach to representing linguistic information. The implementation of the representation is a straightforward extension of Random Indexing models previously used for lexical linguistic items. The paper shows how the implementedmodel is able to represent a broad range of linguistic features in a common integral framework of fixed dimensionality, which is computationally habitable, and which is suitable as a bridge between symbolic representations such as dependency analysis and continuous representations used, for example, in classifiers or further machine-learning approaches. This is achieved with operations on vectors that constitute a powerful computational algebra, accompanied with an associative memory for the vectors. The paper provides a technical overview of the framework and a worked through implemented example of how it can be applied to various types of linguistic features.


Author(s):  
Alejandro Moreo Fernández ◽  
Andrea Esuli ◽  
Fabrizio Sebastiani

Polylingual Text Classification (PLC) is a supervised learning task that consists of assigning class labels to documents written in different languages, assuming that a representative set of training documents is available for each language. This scenario is more and more frequent, given the large quantity of multilingual platforms and communities emerging on the Internet. In this work we analyse some important methods proposed in the literature that are machine-translation-free and dictionary-free, and we propose a particular configuration of the Random Indexing method (that we dub Lightweight Random Indexing). We show that it outperforms all compared algorithms and also displays a significantly reduced computational cost.


Sign in / Sign up

Export Citation Format

Share Document