Categorising Vaccine Confidence with Transformer-Based Machine Learning Model: The Nuances of Vaccine Sentiment on Twitter (Preprint)
BACKGROUND With growing conversations online and less than desired maternal vaccination uptake rates, these conversations could provide useful insight to inform future interventions. Automated processes for this type of analysis, such as natural language processing (NLP), have faced challenges extracting complex stances, like attitudes toward vaccines, from large text. OBJECTIVE In this study, we aimed to build upon recent advances in Transformer-based machine learning methods, and test if this could be used as a tool to assess the stance of social media posts towards vaccination during pregnancy. METHODS A total of 16,604 Tweets posted between 1 November 2018 and 30 April 2019 were selected by boolean searches related to maternal vaccination. Tweets were coded by three individual researchers into the categories “Promotional”, “Discouraging”, “Ambiguous” and “Neutral” After creating a final dataset of 2,722 unique tweets, multiple machine learning methods were trained on the dataset and then tested and compared to the human annotators. RESULTS We received an accuracy of 81.8% (F-score= 0.78) compared to the agreed score between the three annotators. For comparison, the accuracies of the individual annotators compared to the final score were 83.3%, 77.9% and 77.5%. CONCLUSIONS This study demonstrates the ability to achieve close to the same accuracy in categorising tweets using our machine learning models as could be expected by a single human annotator. The potential to use this reliable and accurate automated process could free up valuable time and resource constraints of conducting this analysis, in addition to inform potentially effective and necessary interventions. CLINICALTRIAL N/A