• search hit 1 of 1
Back to Result List

Generating a non-sexist corpus through gamification for automatic sexism detection

  • Most social media platforms allow users to freely express their opinions, feelings, and beliefs. However, in recent years the growing propagation of hate speech, offensive language, racism and sexism on the social media outlets have drawn attention from individuals, companies, and researchers. Today, sexism both online and offline with different forms, including blatant, covert, and subtle lan- guage, is a common phenomenon in society. A notable amount of work has been done over identifying sexist content and computationally detecting sexism which exists online. Although previous efforts have mostly used peoples’ activities on social media platforms such as Twitter as a public and helpful source for collecting data, they neglect the fact that the method of gathering sexist tweets could be biased towards the initial search terms. Moreover, some forms of sexism could be missed since some tweets which contain offensive language could be misclassified as hate speech. Further, in existing hate speech corpora, sexist tweets mostly express hostile sexism, and to some degree, the other forms of sexism which also appear online was disregarded. Besides, the creation of labeled datasets with manual exertion, relying on users to report offensive comments with a tremendous effort by human annotators is not only a costly and time-consuming process, but it also raises the risk of involving discrimination under biased judgment. This thesis generates a novel sexist and non-sexist dataset which is constructed via "UnSexistifyIt", an online web-based game that incentivizes the players to make minimal modifications to a sexist statement with the goal of turning it into a non-sexist statement and convincing other players that the modified statement is non-sexist. The game applies the methodology of "Game With A Purpose" to generate data as a side-effect of playing the game and also employs the gamification and crowdsourcing techniques to enhance non-game contexts. When voluntary participants play the game, they help to produce non-sexist statements which can reduce the cost of generating new corpus. This work explores how diverse individual beliefs concerning sexism are. Further, the result of this work highlights the impact of various linguistic features and content attributes regarding sexist language detection. Finally, this thesis could help to expand our understanding regarding the syntactic and semantic structure of sexist and non-sexist content and also provides insights to build a probabilistic classifier for single sentences into sexist or non-sexist classes and lastly find a potential ground truth for such a classifier.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Ali Aghelmaleki
Advisor:Claudia Wagner, Anupama Aggarwal
Document Type:Master's Thesis
Date of completion:2019/03/27
Date of publication:2019/03/28
Publishing institution:Universität Koblenz-Landau, Universitätsbibliothek
Granting institution:Universität Koblenz-Landau, Campus Koblenz, Fachbereich 4
Date of final exam:2019/02/14
Release Date:2019/03/28
Number of pages:xi, 39
Institutes:Fachbereich 4 / Institute for Web Science and Technologies
Licence (German):License LogoEs gilt das deutsche Urheberrecht: § 53 UrhG