Das Suchergebnis hat sich seit Ihrer Suchanfrage verändert. Eventuell werden Dokumente in anderer Reihenfolge angezeigt.
  • Treffer 31 von 532
Zurück zur Trefferliste

Inferring gender of Reddit users

  • The content aggregator platform Reddit has established itself as one of the most popular websites in the world. However, scientific research on Reddit is hindered as Reddit allows (and even encourages) user anonymity, i.e., user profiles do not contain personal information such as the gender. Inferring the gender of users in large-scale could enable the analysis of gender-specific areas of interest, reactions to events, and behavioral patterns. In this direction, this thesis suggests a machine learning approach of estimating the gender of Reddit users. By exploiting specific conventions in parts of the website, we obtain a ground truth for more than 190 million comments of labeled users. This data is then used to train machine learning classifiers to use them to gain insights about the gender balance of particular subreddits and the platform in general. By comparing a variety of different approaches for classification algorithm, we find that character-level convolutional neural network achieves performance with an 82.3% F1 score on a task of predicting a gender of a user based on his/her comments. The score surpasses 85% mark for frequent users with more than 50 comments. Furthermore, we discover that female users are less active on Reddit platform, they write fewer comments and post in fewer subreddits on average, when compared to male users.

Volltext Dateien herunterladen

Metadaten exportieren

Metadaten
Verfasserangaben:Evgenii Vasilev
URN:urn:nbn:de:kola-16196
Gutachter:Claudia Wagner
Betreuer:Claudia Wagner, Florian Lemmerich
Dokumentart:Masterarbeit
Sprache:Englisch
Datum der Fertigstellung:28.03.2018
Datum der Veröffentlichung:09.04.2018
Veröffentlichende Institution:Universität Koblenz, Universitätsbibliothek
Titel verleihende Institution:Universität Koblenz, Fachbereich 4
Datum der Abschlussprüfung:04.03.2018
Datum der Freischaltung:09.04.2018
Freies Schlagwort / Tag:Analysis of social platform; Natural Language Processing; Reddit; Text classification
Seitenzahl:78
Institute:Fachbereich 4 / Institute for Web Science and Technologies
DDC-Klassifikation:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
BKL-Klassifikation:54 Informatik
Lizenz (Deutsch):License LogoEs gilt das deutsche Urheberrecht: § 53 UrhG