scholarly journals The Gender Gap Tracker: Using Natural Language Processing to measure gender bias in media

PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0245533
Author(s):  
Fatemeh Torabi Asr ◽  
Mohammad Mazraeh ◽  
Alexandre Lopes ◽  
Vasundhara Gautam ◽  
Junette Gonzales ◽  
...  

We examine gender bias in media by tallying the number of men and women quoted in news text, using the Gender Gap Tracker, a software system we developed specifically for this purpose. The Gender Gap Tracker downloads and analyzes the online daily publication of seven English-language Canadian news outlets and enhances the data with multiple layers of linguistic information. We describe the Natural Language Processing technology behind this system, the curation of off-the-shelf tools and resources that we used to build it, and the parts that we developed. We evaluate the system in each language processing task and report errors using real-world examples. Finally, by applying the Tracker to the data, we provide valuable insights about the proportion of people mentioned and quoted, by gender, news organization, and author gender. Data collected between October 1, 2018 and September 30, 2020 shows that, in general, men are quoted about three times as frequently as women. While this proportion varies across news outlets and time intervals, the general pattern is consistent. We believe that, in a world with about 50% women, this should not be the case. Although journalists naturally need to quote newsmakers who are men, they also have a certain amount of control over who they approach as sources. The Gender Gap Tracker relies on the same principles as fitness or goal-setting trackers: By quantifying and measuring regular progress, we hope to motivate news organizations to provide a more diverse set of voices in their reporting.

Author(s):  
Santosh Kumar Mishra ◽  
Rijul Dhir ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .


2021 ◽  
Author(s):  
Abigail Matthews ◽  
Isabella Grasso ◽  
Christopher Mahoney ◽  
Yan Chen ◽  
Esma Wali ◽  
...  

The software development procedure begins with identifying the requirement analysis. The process levels of the requirements start from analysing the requirements to sketch the design of the program, which is very critical work for programmers and software engineers. Moreover, many errors will happen during the requirement analysis cycle transferring to other stages, which leads to the high cost of the process more than the initial specified process. The reason behind this is because of the specifications of software requirements created in the natural language. To minimize these errors, we can transfer the software requirements to the computerized form by the UML diagram. To overcome this, a device has been designed, which plans can provide semi-automatized aid for designers to provide UML class version from software program specifications using natural Language Processing techniques. The proposed technique outlines the class diagram in a well-known configuration and additionally facts out the relationship between instructions. In this research, we propose to enhance the procedure of producing the UML diagrams by utilizing the Natural Language, which will help the software development to analyze the software requirements with fewer errors and efficient way. The proposed approach will use the parser analyze and Part of Speech (POS) tagger to analyze the user requirements entered by the user in the English language. Then, extract the verbs and phrases, etc. in the user text. The obtained results showed that the proposed method got better results in comparison with other methods published in the literature. The proposed method gave a better analysis of the given requirements and better diagrams presentation, which can help the software engineers. Key words: Part of Speech,UM


Author(s):  
Kiran Raj R

Today, everyone has a personal device to access the web. Every user tries to access the knowledge that they require through internet. Most of the knowledge is within the sort of a database. A user with limited knowledge of database will have difficulty in accessing the data in the database. Hence, there’s a requirement for a system that permits the users to access the knowledge within the database. The proposed method is to develop a system where the input be a natural language and receive an SQL query which is used to access the database and retrieve the information with ease. Tokenization, parts-of-speech tagging, lemmatization, parsing and mapping are the steps involved in the process. The project proposed would give a view of using of Natural Language Processing (NLP) and mapping the query in accordance with regular expression in English language to SQL.


Author(s):  
Kaushika Pal ◽  
Biraj V. Patel

A large section of World Wide Web is full of Documents, content; Data, Big data, unformatted data, formatted data, unstructured and unorganized data and we need information infrastructure, which is useful and easily accessible as an when required. This research work is combining approach of Natural Language Processing and Machine Learning for content-based classification of documents. Natural Language Processing is used which will divide the problem of understanding entire document at once into smaller chucks and give us only with useful tokens responsible for Feature Extraction, which is machine learning technique to create Feature Set which helps to train classifier to predict label for new document and place it at appropriate location. Machine Learning subset of Artificial Intelligence is enriched with sophisticated algorithms like Support Vector Machine, K – Nearest Neighbor, Naïve Bayes, which works well with many Indian Languages and Foreign Language content’s for classification. This Model is successful in classifying documents with more than 70% of accuracy for major Indian Languages and more than 80% accuracy for English Language.


2021 ◽  
Vol 37 (S1) ◽  
pp. 30-30
Author(s):  
Savitri Pandey ◽  
Christopher Marshall ◽  
Maria Pokora ◽  
Anne Oyewole ◽  
Dawn Craig

IntroductionVarious strategies to suppress the Coronavirus have been adopted by governments across the world; one such strategy is diagnostic testing. The anxiety of testing on individuals is difficult to quantify. This analysis explores the use of soft intelligence from Twitter (USA, UK & India) in helping better understand this issue.MethodsA total of 650,000 tweets were collected between September and October 2020, using Twitter API using hashtags such as ‘#oxymeter’, ‘#oximeter’, ‘#antibodytest’, ‘#infraredthermometer’, ‘#swabtest’, ‘#rapidtest’, and ‘#antigen’. We applied natural language processing (TextBlob) to assign sentiment and categorize the tweets by emotions and attitude. WordCloud was then used to identify the single topmost 500 words in the whole tweet dataset.ResultsGlobal analysis and pre-processing of the tweets indicate that 21 percent, seven percent and four percent of tweets originated from the USA, UK, and India respectively. The tweets from #antibody, #rapid, #antigen, and #swabtest were positive sentiments, whereas #oxymeter, #infraredthermometer were mostly neutral. The underlying emotions of the tweets were approximately 2.5 times more positive than negative. The most used words in the tweets included ‘hope’ ‘insurance’, ‘symptoms’, ‘love’, ‘painful’, ‘cough’, ‘fast test’, ‘wife’, and ‘kids’.ConclusionsThe finding suggests that it may be reasonable to infer that people are generally concerned about their personal and social wellbeing, wanting to keep themselves safe and perceive testing to deliver some component of that feeling of safety. There are several limitations to this study such as it was restricted to only three countries, and includes only English language tweets with a limited number of hashtags.


Sign in / Sign up

Export Citation Format

Share Document