Actors (Automated Content Analysis)

DOCA - Database of Variables for Content Analysis ◽

10.34778/1b ◽

2021 ◽

Author(s):

Valerie Hase

Keyword(s):

Content Analysis ◽

Named Entities ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

State Of The Union ◽

Automated Content Analysis ◽

German Newspaper ◽

Speech Tagging ◽

Validation Of Results ◽

Newspaper Articles

Actors in coverage might be individuals, groups, or organizations, which are discussed, described, or quoted in the news. The datasets referred to in the table are described in the following paragraph: Benoit and Matuso (2020) uses fictional sentences (N = 5) to demonstrate how named entities and noun phrases can be identified automatically. Lind and Meltzer (2020) demonstrate the use of organic dictionaries to identify actors in German newspaper articles (2013-2017, N = 348,785). Puschmann (2019) uses four data sets to demonstrate how sentiment/tone may be analyzed by the computer. Using tweets (2016, N = 18,826), German newspaper articles (2011-2016, N = 377), Swiss newspaper articles (2007-2012, N = 21,280), and debate transcripts (1970-2017, N = 7,897), he extracts nouns and named entities from text. Lastly, Wiedemann and Niekler (2017) extract proper nouns from State of the Union speeches (1790-2017, N = 233). Field of application/theoretical foundation: Related to theories of “Agenda Setting” and “Framing”, analyses might want to know how much weight is given to a specific actor, how these actors are evaluated and what perspectives and frames they might bring into the discussion how prominently. References/combination with other methods of data collection: Oftentimes, studies use both manual and automated content analysis to identify actors in text. This might be a useful tool to extend the lists of actors that can be found as well as to validate automated analyses. For example, Lind and Meltzer (2020) combine manual coding and dictionaries to identify the salience of women in the news. Table 1. Measurement of “Actors” using automated content analysis. Author(s) Sample Procedure Formal validity check with manual coding as benchmark* Code Benoit & Matuso (2020) Fictional sentences Part-of-Speech tagging; syntactic parsing Not reported https://cran.r-project.org/web/packages/spacyr/vignettes/using_spacyr.html Lind & Meltzer (2020) Newspapers Dictionary approach Reported https://osf.io/yqbcj/?view_only=369e2004172b43bb91a39b536970e50b Puschmann (2019) (a) Tweets (b) German newspaper articles (c) Swiss newspaper articles (d) United Nations General Debate Transcripts Part-of-Speech tagging; syntactic parsing Not reported http://inhaltsanalyse-mit-r.de/ner.html Wiedemann & Niekler (2017) State of the Union speeches Part-of-Speech tagging Not reported https://tm4ss.github.io/docs/Tutorial_8_NER_POS.html *Please note that many of the sources listed here are tutorials on how to conducted automated analyses – and therefore not focused on the validation of results. Readers should simply read this column as an indication in terms of which sources they can refer to if they are interested in the validation of results. References Benoit, K., & Matuso. (2020). A Guide to Using spacyr. Retrieved from https://cran.r-project.org/web/packages/spacyr/vignettes/using_spacyr.html Lind, F., & Meltzer, C. E. (2020). Now you see me, now you don’t: Applying automated content analysis to track migrant women’s salience in German news. Feminist Media Studies, 1–18. Puschmann, C. (2019). Automatisierte Inhaltsanalyse mit R. Retrieved from http://inhaltsanalyse-mit-r.de/index.html Wiedemann, G., Niekler, A. (2017). Hands-on: a five day text mining course for humanists and social scientists in R. Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH@GSCL 2017), Berlin. Retrieved from https://tm4ss.github.io/docs/index.html

Download Full-text

Sentiment/tone (Automated Content Analysis)

DOCA - Database of Variables for Content Analysis ◽

10.34778/1d ◽

2021 ◽

Author(s):

Valerie Hase

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Sentiment Analysis ◽

Supervised Machine Learning ◽

Data Sets ◽

Financial News ◽

State Of The Union ◽

Automated Content Analysis ◽

Course Material ◽

Newspaper Articles

Sentiment/tone describes the way issues or specific actors are described in coverage. Many analyses differentiate between negative, neutral/balanced or positive sentiment/tone as broader categories, but analyses might also measure expressions of incivility, fear, or happiness, for example, as more granular types of sentiment/tone. Analyses can detect sentiment/tone in full texts (e.g., general sentiment in financial news) or concerning specific issues (e.g., specific sentiment towards the stock market in financial news or a specific actor). The datasets referred to in the table are described in the following paragraph: Puschmann (2019) uses four data sets to demonstrate how sentiment/tone may be analyzed by the computer. Using Sherlock Holmes stories (18th century, N = 12), tweets (2016, N = 18,826), Swiss newspaper articles (2007-2012, N = 21,280), and debate transcripts (2013-2017, N = 205,584), he illustrates how dictionaries may be applied for such a task. Rauh (2019) uses three data sets to validate his organic German language dictionary for sentiment/tone. His data consists of sentences from German parliament speeches (1991-2013, N = 1,500), German-language quasi-sentences from German, Austrian and Swiss party manifestos (1998-2013, N = 14,008) and newspaper, journal and news wire articles (2011-2012, N = 4,038). Silge and Robinson (2020) use six Jane Austen novels to demonstrate how dictionaries may be used for sentiment analysis. Van Atteveldt and Welbers (2020) use state of the Union speeches (1789-2017, N = 58) for the same purpose. The same authors (van Atteveldt & Welbers, 2019) show based on a dataset of N = 2,000 movie reviews how supervised machine learning might also do the trick. In their Quanteda tutorials, Watanabe and Müller (2019) demonstrate the use of dictionaries and supervised machine learning for sentiment analysis on UK newspaper articles (2012-2016, N = 6,000) as well as the same set of movie reviews (n = 2,000). Lastly, Wiedemann and Niekler (2017) use state of the Union speeches (1790-2017, N = 233) to demonstrate how sentiment/tone can be coded automatically via a dictionary approach. Field of application/theoretical foundation: Related to theories of “Framing” and “Bias” in coverage, many analyses are concerned with the way the news evaluates and interprets specific issues and actors. References/combination with other methods of data collection: Manual coding is needed for many automated analyses, including the ones concerned with sentiment. Studies for example use manual content analysis to develop dictionaries, to create training sets on which algorithms used for automated classification are trained, or to validate the results of automated analyses (Song et al., 2020). Table 1. Measurement of “Sentiment/Tone” using automated content analysis. Author(s) Sample Procedure Formal validity check with manual coding as benchmark* Code Puschmann (2019) (a) Sherlock Holmes stories (b) Tweets (c) Swiss newspaper articles (d) German Parliament transcripts Dictionary approach Not reported http://inhaltsanalyse-mit-r.de/sentiment.html Rauh (2018) (a) Bundestag speeches (b) Quasi-sentences from German, Austrian and Swiss party manifestos (c) Newspapers, journals, agency reports Dictionary approach Reported https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BKBXWD Silge & Robinson (2020) Books by Jane Austen Dictionary approach Not reported https://www.tidytextmining.com/sentiment.html van Atteveldt & Welbers (2020) State of the Union speeches Dictionary approach Reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/sentiment_analysis.md van Atteveldt & Welbers (2019) Movie reviews Supervised Machine Learning Approach Reported https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_ml.md Watanabe & Müller (2019) Newspaper articles Dictionary approach Not reported https://tutorials.quanteda.io/advanced-operations/targeted-dictionary-analysis/ Watanabe & Müller (2019) Movie reviews Supervised Machine Learning Approach Reported https://tutorials.quanteda.io/machine-learning/nb/ Wiedemann & Niekler (2017) State of the Union speeches Dictionary approach Not reported https://tm4ss.github.io/docs/Tutorial_3_Frequency.html *Please note that many of the sources listed here are tutorials on how to conducted automated analyses – and therefore not focused on the validation of results. Readers should simply read this column as an indication in terms of which sources they can refer to if they are interested in the validation of results. References Puschmann, C. (2019). Automatisierte Inhaltsanalyse mit R. Retrieved from http://inhaltsanalyse-mit-r.de/index.html Rauh, C. (2018). Validating a sentiment dictionary for German political language—A workbench note. Journal of Information Technology & Politics, 15(4), 319–343. doi:10.1080/19331681.2018.1485608 Silge, J., & Robinson, D. (2020). Text mining with R. A tidy approach. Retrieved from https://www.tidytextmining.com/ Song, H., Tolochko, P., Eberl, J.-M., Eisele, O., Greussing, E., Heidenreich, T., Lind, F., Galyga, S., & Boomgaarden, H.G. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication, 37(4), 550-572. van Atteveldt, W., & Welbers, K. (2019). Supervised Text Classification. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/r_text_ml.md van Atteveldt, W., & Welbers, K. (2020). Supervised Sentiment Analysis in R. Retrieved from https://github.com/ccs-amsterdam/r-course-material/blob/master/tutorials/sentiment_analysis.md Watanabe, K., & Müller, S. (2019). Quanteda tutorials. Retrieved from https://tutorials.quanteda.io/ Wiedemann, G., Niekler, A. (2017). Hands-on: a five day text mining course for humanists and social scientists in R. Proceedings of the 1st Workshop Teaching NLP for Digital Humanities (Teach4DH@GSCL 2017), Berlin. Retrieved from https://tm4ss.github.io/docs/index.html

Download Full-text

Frames (Automated Content Analysis)

DOCA - Database of Variables for Content Analysis ◽

10.34778/1c ◽

2021 ◽

Author(s):

Valerie Hase

Keyword(s):

Content Analysis ◽

Network Analysis ◽

Topic Modeling ◽

Semantic Networks ◽

Newspaper Coverage ◽

Journalism Studies ◽

Detection Algorithms ◽

Automated Content Analysis ◽

Validation Of Results ◽

Newspaper Articles

Frames describe the way issues are presented, i.e., what aspects are made salient when communicating about these issues. Field of application/theoretical foundation: The concept of frames is directly based on the theory of “Framing”. However, many studies using automated content analysis are lacking a clear theoretical definition of what constitutes a frame. As an exception, Walter and Ophir (2019) use automated content analysis to explore issue and strategy frames as defined by Cappella and Jamieson (1997). Vu and Lynn (2020) refer to Entman’s (1991) understanding of frames. The datasets referred to in the table are described in the following paragraph: Van der Meer et al. (2010) use a dataset consisting of Dutch newspaper articles (1991-2015, N = 9,443) and LDA topic modeling in combination with k-means clustering to identify frames. Walter and Ophir (2019) use three different datasets and a combination of topic modeling, network analysis and community detection algorithms to analyze frames. Their datasets consist of political newspaper articles and wire service coverage (N = 8,337), newspaper articles on foreign nations (2010-2015, N = 18,216) and health-related newspaper coverage (2009-2016, N = 5,005). Lastly, Vu and Lynn (2020) analyze newspaper coverage of the Rohingya crisis (2017-2018, N = 747) concerning frames. References/combination with other methods of data collection: While most approaches only rely on automated data collection and analyses, some also combine automated and manual coding. For example, a recent study by Vu and Lynn (2020) proposes to combine semantic networks and manual coding to identify frames. Table 1. Measurement of “Frames” using automated content analysis. Author(s) Sample Procedure Formal validity check with manual coding as benchmark* Code Vu & Lynn (2020) Newspaper articles Semantic networks; manual coding Reported Not available van der Meer et al. (2019) Newspaper articles LDA topic modeling; k-means clustering Not reported Not available Walter & Ophir (2019) (a) U.S. newspapers and wire service articles (b) Newspaper articles (c) Newspaper articles LDA topic modeling, network analysis; community detection algorithms Not reported https://github.com/DrorWalt/ANTMN *Please note that many of the sources listed here are tutorials on how to conducted automated analyses – and therefore not focused on the validation of results. Readers should simply read this column as an indication in terms of which sources they can refer to if they are interested in the validation of results. References Cappella, J. N., & Jamieson, K. H. (1997). Spiral of cynicism: The press and the public good. New York: Oxford University Press. Entman, R. M. 1991. Framing U.S. coverage of international news: contrasts in narratives of the KAL and Iran Air incidents. Journal of Communication, 41(4), 6-7. van der Meer, T. G. L. A., Kroon, A. C., Verhoeven, P., & Jonkman, J. (2019). Mediatization and the disproportionate attention to negative news: The case of airplane crashes. Journalism Studies, 20(6), 783–803. Walter, D., & Ophir, Y. (2019). News frame analysis: an inductive mixed-method computational approach. Communication Methods and Measures, 13(4), 248–266. Vu, H. T., & Lynn, N. (2020). When the news takes sides: Automated framing analysis of news coverage of the rohingya crisis by the elite press from three countries. Journalism Studies. Online first publication. doi:10.1080/1461670X.2020.1745665

Download Full-text

Lexical Rule and Lexicon Effect for Part of Speech Tagging Bahasa Madura

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v18i1.332 ◽

2018 ◽

Vol 18 (1) ◽

pp. 65-72

Author(s):

Nindian Puspa Dewi ◽

Ubaidi Ubaidi

Keyword(s):

Text Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Bahasa Indonesia

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.

Download Full-text

Mongolian part-of-speech tagging approach based on conditional random fields

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02038 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2038-2040

Author(s):

Yu-long YING ◽

Miao LI ◽

bala Wuda ◽

Hai ZHU

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Procedia Computer Science ◽

10.1016/j.procs.2021.03.026 ◽

2021 ◽

Vol 184 ◽

pp. 148-155

Author(s):

Abdul Munem Nerabie ◽

Manar AlKhatib ◽

Sujith Samuel Mathew ◽

May El Barachi ◽

Farhad Oroumchian

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Learning Approach ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

The Impact ◽

Speech Tagging

Download Full-text

Part-of-speech tagging for web search queries using a large-scale web corpus

Proceedings of the Symposium on Applied Computing - SAC '17 ◽

10.1145/3019612.3019694 ◽

2017 ◽

Cited By ~ 1

Author(s):

Atsushi Keyaki ◽

Jun Miyazaki

Keyword(s):

Large Scale ◽

Web Search ◽

Search Queries ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Korean Part-of-speech Tagging Based on Morpheme Generation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3373608 ◽

2020 ◽

Vol 19 (3) ◽

pp. 1-10

Author(s):

Hyun-Je Song ◽

Seong-Bae Park

Keyword(s):

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Amazigh Part Of Speech Tagging using Gated recurrent units (GRU)

2021 7th International Conference on Optimization and Applications (ICOA) ◽

10.1109/icoa51614.2021.9442662 ◽

2021 ◽

Author(s):

MAAROUF Otman ◽

EL AYACHI Rachid ◽

BINIZ Mohamed

Keyword(s):

Part Of Speech Tagging ◽

Part Of Speech ◽

Gated Recurrent Units ◽

Speech Tagging

Download Full-text

Part-Of-Speech Tagging in French: State-of-the-Art and Obstacles

2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) ◽

10.1109/snams52053.2020.9336546 ◽

2020 ◽

Author(s):

Edouard Ngor Sarr ◽

Ousmane Sall ◽

Lamine Faty

Keyword(s):

State Of The Art ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese

Journal of the Brazilian Computer Society ◽

10.1186/s13173-014-0020-x ◽

2015 ◽

Vol 21 (1) ◽

Cited By ~ 16

Author(s):

Erick R Fonseca ◽

João Luís G Rosa ◽

Sandra Maria Aluísio

Keyword(s):

Word Embeddings ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text