scholarly journals Sentiment/tone (Automated Content Analysis)

Author(s):  
Valerie Hase

Sentiment/tone describes the way issues or specific actors are described in coverage. Many analyses differentiate between negative, neutral/balanced or positive sentiment/tone as broader categories, but analyses might also measure expressions of incivility, fear, or happiness, for example, as more granular types of sentiment/tone. Analyses can detect sentiment/tone in full texts (e.g., general sentiment in financial news) or concerning specific issues (e.g., specific sentiment towards the stock market in financial news or a specific actor). The datasets referred to in the table are described in the following paragraph: Puschmann (2019) uses four data sets to demonstrate how sentiment/tone may be analyzed by the computer. Using Sherlock Holmes stories (18th century, N = 12), tweets (2016, N = 18,826), Swiss newspaper articles (2007-2012, N = 21,280), and debate transcripts (2013-2017, N = 205,584), he illustrates how dictionaries may be applied for such a task. Rauh (2019) uses three data sets to validate his organic German language dictionary for sentiment/tone. His data consists of sentences from German parliament speeches (1991-2013, N = 1,500), German-language quasi-sentences from German, Austrian and Swiss party manifestos (1998-2013, N = 14,008) and newspaper, journal and news wire articles (2011-2012, N = 4,038). Silge and Robinson (2020) use six Jane Austen novels to demonstrate how dictionaries may be used for sentiment analysis. Van Atteveldt and Welbers (2020) use state of the Union speeches (1789-2017, N = 58) for the same purpose. The same authors (van Atteveldt & Welbers, 2019) show based on a dataset of N = 2,000 movie reviews how supervised machine learning might also do the trick. In their Quanteda tutorials, Watanabe and Müller (2019) demonstrate the use of dictionaries and supervised machine learning for sentiment analysis on UK newspaper articles (2012-2016, N = 6,000) as well as the same set of movie reviews (n = 2,000). Lastly, Wiedemann and Niekler (2017) use state of the Union speeches (1790-2017, N = 233) to demonstrate how sentiment/tone can be coded automatically via a dictionary approach. Field of application/theoretical foundation: Related to theories of “Framing” and “Bias” in coverage, many analyses are concerned with the way the news evaluates and interprets specific issues and actors. References/combination with other methods of data collection: Manual coding is needed for many automated analyses, including the ones concerned with sentiment. Studies for example use manual content analysis to develop dictionaries, to create training sets on which algorithms used for automated classification are trained, or to validate the results of automated analyses (Song et al., 2020).   Table 1. Measurement of “Sentiment/Tone” using automated content analysis. Author(s) Sample Procedure Formal validity check with manual coding as benchmark* Code Puschmann (2019) (a) Sherlock Holmes stories (b) Tweets (c) Swiss newspaper articles (d) German Parliament transcripts   Dictionary approach Not reported http://inhaltsanalyse-mit-r.de/sentiment.html Rauh (2018) (a) Bundestag speeches (b) Quasi-sentences from German, Austrian and Swiss party manifestos (c) Newspapers, journals, agency reports Dictionary approach Reported https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BKBXWD Silge & Robinson (2020) Books by Jane Austen Dictionary approach Not reported https://www.tidytextmining.com/sentiment.html van Atteveldt & Welbers (2020) State of the Union speeches Dictionary approach Reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/sentiment_analysis.md van Atteveldt & Welbers (2019) Movie reviews Supervised Machine Learning Approach Reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_ml.md Watanabe & Müller (2019) Newspaper articles Dictionary approach Not reported https://tutorials.quanteda.io/advanced-operations/targeted-dictionary-analysis/ Watanabe & Müller (2019) Movie reviews Supervised Machine Learning Approach Reported https://tutorials.quanteda.io/machine-learning/nb/ Wiedemann & Niekler (2017) State of the Union speeches Dictionary approach Not reported https://tm4ss.github.io/docs/Tutorial_3_Frequency.html *Please note that many of the sources listed here are tutorials on how to conducted automated analyses – and therefore not focused on the validation of results. Readers should simply read this column as an indication in terms of which sources they can refer to if they are interested in the validation of results. References Puschmann, C. (2019). Automatisierte Inhaltsanalyse mit R. Retrieved from http://inhaltsanalyse-mit-r.de/index.html Rauh, C. (2018). Validating a sentiment dictionary for German political language—A workbench note. Journal of Information Technology & Politics, 15(4), 319–343. doi:10.1080/19331681.2018.1485608 Silge, J., & Robinson, D. (2020). Text mining with R. A tidy approach. Retrieved from https://www.tidytextmining.com/ Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H.G. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication, 37(4), 550-572. van Atteveldt, W., & Welbers, K. (2019). Supervised Text Classification. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_ml.md van Atteveldt, W., & Welbers, K. (2020). Supervised Sentiment Analysis in R. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/sentiment_analysis.md Watanabe, K., & Müller, S. (2019). Quanteda tutorials. Retrieved from https://tutorials.quanteda.io/ Wiedemann, G., Niekler, A. (2017). Hands-on: a five day text mining course for humanists and social scientists in R. Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH@GSCL 2017), Berlin. Retrieved from https://tm4ss.github.io/docs/index.html

Author(s):  
Valerie Hase

Topics describe the main issue discussed in an article, for example: Does an article deal with politics, economics or sports? Field of application/theoretical foundation: In the context of “Agenda Setting”, studies analyze which issues are on the public agenda. In the context of “News Values”, studies may analyze why some topics are more prominently covered than others. References/combination with other methods of data collection: Many studies combine manual inspection of topics with their automated detection. Quinn et al. (2010) demonstrate for their analyses of legislative speeches how manual inspection may increase the validity of results. Similarly, Hase et al. (2020) use automated content analysis to find and map similar topics for which manual coding is then conducted. Such combinations may contribute to a better and more detailed understanding of topics than automated analyses by themselves. The datasets referred to in the table are described in the following paragraph: Puschmann (2019a) uses New York Times articles (1996-2006, N = 30,862) as well as articles from Die Zeit (2011-2016, N = 377) to identify topics using supervised machine learning. In another tutorial, Puschmann (2019b) uses Sherlock Holmes stories (18th century, N = 12), articles from Die Zeit (2011-2016, N = 377) and debate transcripts (1970-2017, N = 7,897) to apply LDA and structural topic modeling. In her tutorials, Silge (2018a, 2018b) also uses Sherlock Holmes stories (18th century, N = 12) and a news corpus also containing comments (2006-ongoing, N = 100,000). Silge and Robinson (2020) apply LDA topic modeling on news stories by the Associated Press (1992, N = 2,246) as well as books by Dickens, Wells, Verne and Austen (18th century, N = 4). Roberts et al. (2019) use blogposts (2008, N = 13,248) for structural topic modeling. Watanabe and Müller (2019) apply LDA topic modeling on newspaper articles from The Guardian (2016, N = 6,000). Van Atteveldt and Welbers (2019, 2020) use State of the Union speeches (1981-2017, N = 10 and 1789-2017, N = 58) for their analyses. Lastly, Wiedemann and Niekler (2017) use the same data containing State of the Union speeches (1790-2017, N = 223).   Table 1. Measurement of “Topics” using automated content analysis. Author(s) Sample Procedure Formal validity check with manual coding as benchmark* Code Puschmann (2019a) (a) Newspaper articles (b) Newspaper articles Supervised machine learning Reported http://inhaltsanalyse-mit-r.de/maschinelles_lernen.html Puschmann (2019b) (a) Sherlock Holmes stories (b) Newspaper articles (c) United Nations General Debate Transcripts LDA topic modeling; structural topic modeling Not reported http://inhaltsanalyse-mit-r.de/themenmodelle.html Silge (2018a) & Silge (2018b) (a) Sherlock Holmes stories (b) News stories and comments t Structural topic modeling Not reported https://juliasilge.com/blog/sherlock-holmes-stm/ & https://juliasilge.com/blog/evaluating-stm/ Silge & Robinson (2020) (a) News articles (b) Books         LDA topic modeling Not reported https://www.tidytextmining.com/topicmodeling.html Roberts et al. (2019) Blogposts Structural topic modeling Not reported https://www.jstatsoft.org/article/view/v091i02 Watanabe & Müller (2019) Newspaper articles LDA topic modeling Not reported https://tutorials.quanteda.io/machine-learning/topicmodel/ van Atteveldt & Welbers (2019) State of the Union speeches Structural topic modeling Not reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_stm.md van Atteveldt & Welbers (2020) State of the Union speeches LDA topic modeling Not reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_lda.md Wiedemann & Niekler (2017) State of the Union speeches LDA topic modeling Not reported https://tm4ss.github.io/docs/Tutorial_6_Topic_Models.html Wiedemann & Niekler (2017) State of the Union speeches Supervised machine learning Reported https://tm4ss.github.io/docs/Tutorial_7_Klassifikation.html *Please note that many of the sources listed here are tutorials on how to conducted automated analyses – and therefore not focused on the validation of results. Readers should simply read this column as an indication in terms of which sources they can refer to if they are interested in the validation of results. References Hase, V., Engelke, K., Kieslich, K. (2020). The things we fear. Combining automated and manual content analysis to uncover themes, topics and threats in fear-related news. Journalism Studies, 21(10), 1384-1402. Puschmann, C. (2019). Automatisierte Inhaltsanalyse mit R. Retrieved from http://inhaltsanalyse-mit-r.de/index.html Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209–228. Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An R Package for Structural Topic Model. Journal of Statistical Software, 91(2), 1–40. Silge, J. (2018a). The game is afoot! Topic modeling of Sherlock Holmes stories. Retrieved from https://juliasilge.com/blog/sherlock-holmes-stm/ Silge, J. (2018b). Training, evaluating, and interpreting topic models. Retrieved from https://juliasilge.com/blog/evaluating-stm/ Silge, J., & Robinson, D. (2020). Text Mining with R. A tidy approach. Retrieved from https://www.tidytextmining.com/ van Atteveldt, W., & Welbers, K. (2019). Structural Topic Modeling. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_stm.md van Atteveldt, W., & Welbers, K. (2020). Fitting LDA models in R. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_lda.md Watanabe, K., & Müller, S. (2019). Quanteda tutorials. Retrieved from https://tutorials.quanteda.io/ Wiedemann, G., Niekler, A. (2017). Hands-on: a five day text mining course for humanists and social scientists in R. Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH@GSCL 2017), Berlin. Retrieved from https://tm4ss.github.io/docs/index.html


Author(s):  
Valerie Hase

Actors in coverage might be individuals, groups, or organizations, which are discussed, described, or quoted in the news. The datasets referred to in the table are described in the following paragraph: Benoit and Matuso (2020) uses fictional sentences (N = 5) to demonstrate how named entities and noun phrases can be identified automatically. Lind and Meltzer (2020) demonstrate the use of organic dictionaries to identify actors in German newspaper articles (2013-2017, N = 348,785). Puschmann (2019) uses four data sets to demonstrate how sentiment/tone may be analyzed by the computer. Using tweets (2016, N = 18,826), German newspaper articles (2011-2016, N = 377), Swiss newspaper articles (2007-2012, N = 21,280), and debate transcripts (1970-2017, N = 7,897), he extracts nouns and named entities from text. Lastly, Wiedemann and Niekler (2017) extract proper nouns from State of the Union speeches (1790-2017, N = 233). Field of application/theoretical foundation: Related to theories of “Agenda Setting” and “Framing”, analyses might want to know how much weight is given to a specific actor, how these actors are evaluated and what perspectives and frames they might bring into the discussion how prominently. References/combination with other methods of data collection: Oftentimes, studies use both manual and automated content analysis to identify actors in text. This might be a useful tool to extend the lists of actors that can be found as well as to validate automated analyses. For example, Lind and Meltzer (2020) combine manual coding and dictionaries to identify the salience of women in the news.   Table 1. Measurement of “Actors” using automated content analysis. Author(s) Sample Procedure Formal validity check with manual coding as benchmark* Code Benoit & Matuso (2020) Fictional sentences Part-of-Speech tagging; syntactic parsing Not reported https://cran.r-project.org/web/packages/spacyr/vignettes/using_spacyr.html Lind & Meltzer (2020) Newspapers Dictionary approach Reported https://osf.io/yqbcj/?view_only=369e2004172b43bb91a39b536970e50b Puschmann (2019) (a) Tweets (b) German newspaper articles (c) Swiss newspaper articles (d) United Nations General Debate Transcripts Part-of-Speech tagging; syntactic parsing Not reported http://inhaltsanalyse-mit-r.de/ner.html Wiedemann & Niekler (2017) State of the Union speeches Part-of-Speech tagging Not reported https://tm4ss.github.io/docs/Tutorial_8_NER_POS.html *Please note that many of the sources listed here are tutorials on how to conducted automated analyses – and therefore not focused on the validation of results. Readers should simply read this column as an indication in terms of which sources they can refer to if they are interested in the validation of results. References Benoit, K., & Matuso. (2020). A Guide to Using spacyr. Retrieved from https://cran.r-project.org/web/packages/spacyr/vignettes/using_spacyr.html Lind, F., & Meltzer, C. E. (2020). Now you see me, now you don’t: Applying automated content analysis to track migrant women’s salience in German news. Feminist Media Studies, 1–18. Puschmann, C. (2019). Automatisierte Inhaltsanalyse mit R. Retrieved from http://inhaltsanalyse-mit-r.de/index.html Wiedemann, G., Niekler, A. (2017). Hands-on: a five day text mining course for humanists and social scientists in R. Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH@GSCL 2017), Berlin. Retrieved from https://tm4ss.github.io/docs/index.html


2018 ◽  
Vol 37 (2) ◽  
pp. 135-159 ◽  
Author(s):  
Gregor Wiedemann

Supervised machine learning is a promising methodological innovation for content analysis (CA) to approach the challenge of ever-growing amounts of text in the digital era. Social scientists have pointed to accurate measurement of category proportions and trends in large collections as their primary goal. Proportional classification, for example, allows for time-series analysis of diachronic data sets or correlation of categories with text-external covariates. We evaluate the performance of two common approaches for this goal: a method based on regression analysis with feature profiles from entire collections and a method aggregating classifier decisions for individual documents. For both, we observed a significant negative effect on classification performance due to the uneven distribution of characteristic language structures within the text collection. For proportional classification, this poses considerable problems. To fix this problem, we propose a workflow of active learning, which alternates between machine learning and human coding. Results from experiments with empirical data (political manifestos) demonstrate that active learning enables researchers to create training sets for automatic CA efficiently, reliably, and with high accuracy for the desired goal while retaining control over the automatic process.


2018 ◽  
Vol 23 (2) ◽  
pp. 247-267 ◽  
Author(s):  
Volha Kananovich

Taxpaying constitutes a major opportunity for citizens to relate to their governments. Although it is true that paying taxes is a responsibility, it also entitles citizens to claim control over government spending, which may facilitate a greater democratization of a country’s political regime. Consistent with this reasoning, a growing body of scholarship has documented a positive relationship between the size of tax revenues extracted by the state and the adherence of the country’s regime to democratic values. What has been left underexplored is the role in this relationship of the media, a commonly available and relied-upon source of information about taxpaying for the public. This study offers a first contribution in this direction, by exploring the relationship between the nature of the political regime and the rhetorical construction of the concept of a taxpayer in the national press. Based on an automated content analysis of articles (N=24,969) published by ninety-two newspapers and news agencies in fifty-one countries using a set of pretrained and validated machine-learning algorithms, the study demonstrates that the less democratic a state is, the more likely it is for the national press to frame a taxpayer as a subordinate in a hierarchical relationship with the state, by discussing taxpaying in tax collection, rather than public spending, terms. The study furthers a more nuanced understanding of the place of the media in the taxation-democratization link and demonstrates the applicability of the supervised machine-learning approach to classifying frames in large cross-national samples of newspaper data.


2021 ◽  
Vol 11 (10) ◽  
pp. 4443
Author(s):  
Rokas Štrimaitis ◽  
Pavel Stefanovič ◽  
Simona Ramanauskaitė ◽  
Asta Slotkienė

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).


2018 ◽  
Vol 46 (1) ◽  

Damian Trilling & Jelle Boumans Automated analysis of Dutch language-based texts. An overview and research agenda While automated methods of content analysis are increasingly popular in today’s communication research, these methods have hardly been adopted by communication scholars studying texts in Dutch. This essay offers an overview of the possibilities and current limitations of automated text analysis approaches in the context of the Dutch language. Particularly in dictionary-based approaches, research is far less prolific as research on the English language. We divide the most common types of content-analytical research questions into three categories: 1) research problems for which automated methods ought to be used, 2) research problems for which automated methods could be used, and 3) research problems for which automated methods (currently) cannot be used. Finally, we give suggestions for the advancement of automated text analysis approaches for Dutch texts. Keywords: automated content analysis, Dutch, dictionaries, supervised machine learning, unsupervised machine learning


2021 ◽  
Author(s):  
Joshua Lois Cruz Paulino ◽  
Lexter Carl Antoja Almirol ◽  
Jun Marco Cruz Favila ◽  
Kent Alvin Gerald Loria Aquino ◽  
Angelica Hernandez De La Cruz ◽  
...  

Author(s):  
V Umarani ◽  
A Julian ◽  
J Deepa

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.


Sign in / Sign up

Export Citation Format

Share Document