• search hit 2 of 5
Back to Result List

Inferring gender of Reddit users

  • The content aggregator platform Reddit has established itself as one of the most popular websites in the world. However, scientific research on Reddit is hindered as Reddit allows (and even encourages) user anonymity, i.e., user profiles do not contain personal information such as the gender. Inferring the gender of users in large-scale could enable the analysis of gender-specific areas of interest, reactions to events, and behavioral patterns. In this direction, this thesis suggests a machine learning approach of estimating the gender of Reddit users. By exploiting specific conventions in parts of the website, we obtain a ground truth for more than 190 million comments of labeled users. This data is then used to train machine learning classifiers to use them to gain insights about the gender balance of particular subreddits and the platform in general. By comparing a variety of different approaches for classification algorithm, we find that character-level convolutional neural network achieves performance with an 82.3% F1 score on a task of predicting a gender of a user based on his/her comments. The score surpasses 85% mark for frequent users with more than 50 comments. Furthermore, we discover that female users are less active on Reddit platform, they write fewer comments and post in fewer subreddits on average, when compared to male users.

Download full text files

Export metadata

Author:Evgenii Vasilev
Referee:Claudia Wagner
Advisor:Claudia Wagner, Florian Lemmerich
Document Type:Master's Thesis
Date of completion:2018/03/28
Date of publication:2018/04/09
Publishing institution:Universität Koblenz, Universitätsbibliothek
Granting institution:Universität Koblenz, Fachbereich 4
Date of final exam:2018/03/04
Release Date:2018/04/09
Tag:Analysis of social platform; Natural Language Processing; Reddit; Text classification
Number of pages:78
Institutes:Fachbereich 4 / Institute for Web Science and Technologies
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
BKL-Classification:54 Informatik
Licence (German):License LogoEs gilt das deutsche Urheberrecht: § 53 UrhG