Psychometric and Validity Issues in Machine Learning Approaches to Personality Assessment: A Focus on Social Media Text Mining

2020 ◽  
Vol 34 (5) ◽  
pp. 826-844 ◽  
Author(s):  
Louis Tay ◽  
Sang Eun Woo ◽  
Louis Hickman ◽  
Rachel M. Saef

In the age of big data, substantial research is now moving toward using digital footprints like social media text data to assess personality. Nevertheless, there are concerns and questions regarding the psychometric and validity evidence of such approaches. We seek to address this issue by focusing on social media text data and (i) conducting a review of psychometric validation efforts in social media text mining (SMTM) for personality assessment and discussing additional work that needs to be done; (ii) considering additional validity issues from the standpoint of reference (i.e. ‘ground truth’) and causality (i.e. how personality determines variations in scores derived from SMTM); and (iii) discussing the unique issues of generalizability when validating SMTM for personality assessment across different social media platforms and populations. In doing so, we explicate the key validity and validation issues that need to be considered as a field to advance SMTM for personality assessment, and, more generally, machine learning personality assessment methods. © 2020 European Association of Personality Psychology

2021 ◽  
Author(s):  
Fei Shen ◽  
Wenting Yu ◽  
Chen Min ◽  
Qianying Ye ◽  
Chuanli Xia ◽  
...  

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.


2018 ◽  
Vol 23 (2) ◽  
pp. 190-203 ◽  
Author(s):  
Wiebke Bleidorn ◽  
Christopher James Hopwood

Machine learning has led to important advances in society. One of the most exciting applications of machine learning in psychological science has been the development of assessment tools that can powerfully predict human behavior and personality traits. Thus far, machine learning approaches to personality assessment have focused on the associations between social media and other digital records with established personality measures. The goal of this article is to expand the potential of machine learning approaches to personality assessment by embedding it in a more comprehensive construct validation framework. We review recent applications of machine learning to personality assessment, place machine learning research in the broader context of fundamental principles of construct validation, and provide recommendations for how to use machine learning to advance our understanding of personality.


Now a day Social Media like Facebook, twitter and Instagram is major Sources for people to share their emotions based on the current situations in society. By knowing the interesting patterns in it, a government/appropriate person for that situation can take good and useful decisions. Sentiment analysis is a method where people can extract the useful information from the text like the emotions (happy, sad, and neutral) of people. Much research work was been underdoing in the area of sentiment analysis. Among that work the Machine learning and Deep learning approaches plays a maximum role. Existing works on sentiment analysis is going in the English language. In this paper, proposed a novel framework that specifically designed to do sentiment analysis of the text data, that available in the telugu language. The proposed framework was integrated with the word embedding model Word2Vec, language translator and deep learning approaches like Recurrent Neural Network and Navie base algorithms to collect and analyse the sentiment in tweeter data that present in telugu language. The results shows effective in terms of accuracy, precision and specificity.


2018 ◽  
Author(s):  
Wiebke Bleidorn ◽  
Christopher James Hopwood

Machine learning has led to important advances in society. One of the most exciting applications of machine learning in psychological science has been the development of assessment tools that can powerfully predict human behavior and personality traits. Thus far, machine learning approaches to personality assessment have been focused on the associations between social media and other digital records with established personality measures. The goal of this paper is to expand the potential of machine learning approaches to personality assessment by embedding it in a more comprehensive construct validation framework. We review recent applications of machine learning to personality assessment, place machine learning research in the broader context of fundamental principles of construct validation and provide recommendations for how to use machine learning to advance our understanding of personality.


2022 ◽  
pp. 171-195
Author(s):  
Jale Bektaş

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.


2020 ◽  
Vol 28 (4) ◽  
pp. 532-551
Author(s):  
Blake Miller ◽  
Fridolin Linder ◽  
Walter R. Mebane

Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where active learning can reduce the cost of labeling text data. We perform these simulations on three corpora that vary in size, document length, and domain. We find that in cases where the document class of interest is not balanced, researchers can label a fraction of the documents one would need using random sampling (or “passive” learning) to achieve equally performing classifiers. We further investigate how varying levels of intercoder reliability affect the active learning procedures and find that even with low reliability, active learning performs more efficiently than does random sampling.


Sign in / Sign up

Export Citation Format

Share Document