Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

Machine Learning for Realised Volatility Forecasting

SSRN Electronic Journal ◽

10.2139/ssrn.3707796 ◽

2020 ◽

Author(s):

Eghbal Rahimikia ◽

Ser-Huang Poon

Keyword(s):

Machine Learning ◽

Volatility Forecasting ◽

Realised Volatility

Download Full-text

Combining Public Machine Learning Models by Using Word Embedding for Human Activity Recognition

2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) ◽

10.1109/percomworkshops51409.2021.9431141 ◽

2021 ◽

Author(s):

Koichi Shimoda ◽

Akihito Taya ◽

Yoshito Tobe

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Human Activity ◽

Human Activity Recognition ◽

Word Embedding ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Drug Repurposing Prediction for Immune-Mediated Cutaneous Diseases using a Word-Embedding–Based Machine Learning Approach

Journal of Investigative Dermatology ◽

10.1016/j.jid.2018.09.018 ◽

2019 ◽

Vol 139 (3) ◽

pp. 683-691 ◽

Cited By ~ 13

Author(s):

Matthew T. Patrick ◽

Kalpana Raja ◽

Keylonnie Miller ◽

Jason Sotzen ◽

Johann E. Gudjonsson ◽

...

Keyword(s):

Machine Learning ◽

Drug Repurposing ◽

Word Embedding ◽

Learning Approach ◽

Immune Mediated ◽

Machine Learning Approach

Download Full-text

Turkish tweet sentiment analysis with word embedding and machine learning

2017 25th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu.2017.7960195 ◽

2017 ◽

Cited By ~ 5

Author(s):

Deger Ayata ◽

Murat Saraclar ◽

Arzucan Ozgur

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Word Embedding

Download Full-text

Math-word embedding in math search and semantic extraction

Scientometrics ◽

10.1007/s11192-020-03502-9 ◽

2020 ◽

Vol 125 (3) ◽

pp. 3017-3046 ◽

Cited By ~ 1

Author(s):

André Greiner-Petter ◽

Abdou Youssef ◽

Terry Ruas ◽

Bruce R. Miller ◽

Moritz Schubotz ◽

...

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Language Processing ◽

Digital Library ◽

Question Answering ◽

Semantic Knowledge ◽

Word Embedding ◽

Mathematical Functions ◽

Search Tasks ◽

Math Search

AbstractWord embedding, which represents individual words with semantically fixed-length vectors, has made it possible to successfully apply deep learning to natural language processing tasks such as semantic role-modeling, question answering, and machine translation. As math text consists of natural text, as well as math expressions that similarly exhibit linear correlation and contextual characteristics, word embedding techniques can also be applied to math documents. However, while mathematics is a precise and accurate science, it is usually expressed through imprecise and less accurate descriptions, contributing to the relative dearth of machine learning applications for information retrieval in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in word embedding, it is worthwhile to explore their use and effectiveness in math information retrieval tasks, such as math language processing and semantic knowledge extraction. In this paper, we explore math embedding by testing it on several different scenarios, namely, (1) math-term similarity, (2) analogy, (3) numerical concept-modeling based on the centroid of the keywords that characterize a concept, (4) math search using query expansions, and (5) semantic extraction, i.e., extracting descriptive phrases for math expressions. Due to the lack of benchmarks, our investigations were performed using the arXiv collection of STEM documents and carefully selected illustrations on the Digital Library of Mathematical Functions (DLMF: NIST digital library of mathematical functions. Release 1.0.20 of 2018-09-1, 2018). Our results show that math embedding holds much promise for similarity, analogy, and search tasks. However, we also observed the need for more robust math embedding approaches. Moreover, we explore and discuss fundamental issues that we believe thwart the progress in mathematical information retrieval in the direction of machine learning.

Download Full-text

Concept of TF-IDF, Common Bag of Word and Word Embedding for Effective Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4582.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2198-2201

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentiment Classification ◽

Word Embedding ◽

Text Representation ◽

Human Beings ◽

Text Data

Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.

Download Full-text

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes (Preprint)

10.2196/preprints.8344 ◽

2017 ◽

Author(s):

Chin Lin ◽

Chia-Jung Hsu ◽

Yu-Sheng Lou ◽

Shih-Jen Yeh ◽

Chia-Cheng Lee ◽

...

Keyword(s):

Machine Learning ◽

Word Embedding ◽

Supervised Machine Learning ◽

Support Vector ◽

Free Text ◽

Learning Models ◽

Diagnosis Codes ◽

Icd 10 ◽

F Measure ◽

Machine Learning Models

BACKGROUND Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). OBJECTIVE Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. METHODS We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. RESULTS In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. CONCLUSIONS Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.

Download Full-text