Refine
Year of publication
Document Type
- Master's Thesis (19)
- Part of Periodical (15)
- Doctoral Thesis (11)
- Study Thesis (5)
- Bachelor Thesis (3)
- Diploma Thesis (3)
- Habilitation (1)
Keywords
- Semantic Web (3)
- ontology (3)
- Linked Open Data (2)
- Maschinelles Lernen (2)
- OWL (2)
- OWL <Informatik> (2)
- Ontology (2)
- RDF <Informatik> (2)
- SPARQL (2)
- mobile phone (2)
Institute
- Institute for Web Science and Technologies (57) (remove)
Wikipedia is the biggest, free online encyclopaedia that can be expanded by any-one. For the users, who create content on a specific Wikipedia language edition, a social network exists. In this social network users are categorised into different roles. These are normal users, administrators and functional bots. Within the networks, a user can post reviews, suggestions or send simple messages to the "talk page" of another user. Each language in the Wikipedia domain has this type of social network.
In this thesis characteristics of the three different roles are analysed in order to learn how they function in one language network of Wikipedia and apply them to another Wikipedia network to identify bots. Timestamps from created posts are analysed to reveal noticeable characteristics referring to continuous messages, message rates and irregular behaviour of a user are discovered. Through this process we show that there exist differences between the roles for the mentioned characteristics.
The purpose of this thesis is to explore the sentiment distributions of Wikipedia concepts.
We analyse the sentiment of the entire English Wikipedia corpus, which includes 5,669,867 articles and 1,906,375 talks, by using a lexicon-based method with four different lexicons.
Also, we explore the sentiment distributions from a time perspective using the sentiment scores obtained from our selected corpus. The results obtained have been compared not only between articles and talks but also among four lexicons: OL, MPQA, LIWC, and ANEW.
Our findings show that among the four lexicons, MPQA has the highest sensitivity and ANEW has the lowest sensitivity to emotional expressions. Wikipedia articles show more sentiments than talks according to OL, MPQA, and LIWC, whereas Wikipedia talks show more sentiments than articles according to ANEW. Besides, the sentiment has a trend regarding time series, and each lexicon has its own bias regarding text describing different things.
Moreover, our research provides three interactive widgets for visualising sentiment distributions for Wikipedia concepts regarding the time and geolocation attributes of concepts.
Social media platforms such as Twitter or Reddit allow users almost unrestricted access to publish their opinions on recent events or discuss trending topics. While the majority of users approach these platforms innocently, some groups have set their mind on spreading misinformation and influencing or manipulating public opinion. These groups disguise as native users from various countries to spread frequently manufactured articles, strong polarizing opinions in the political spectrum and possibly become providers of hate-speech or extremely political positions. This thesis aims to implement an AutoML pipeline for identifying second language speakers from English social media texts. We investigate style differences of text in different topics and across the platforms Reddit and Twitter, and analyse linguistic features. We employ feature-based models with datasets from Reddit, which include mostly English conversation from European users, and Twitter, which was newly created by collecting English tweets from selected trending topics in different countries. The pipeline classifies language family, native language and origin (Native or non-Native English speakers) of a given textual input. We evaluate the resulting classifications by comparing prediction accuracy, precision and F1 scores of our classification pipeline to traditional machine learning processes. Lastly, we compare the results from each dataset and find differences in language use for topics and platforms. We obtained high prediction accuracy for all categories on the Twitter dataset and observed high variance in features such as average text length especially for Balto-Slavic countries.
This Master Thesis is an exploratory research to determine whether it is feasible to construct a subjectivity lexicon using Wikipedia. The key hypothesis is that that all quotes in Wikipedia are subjective and all regular text are objective. The degree of subjectivity of a word, also known as ''Quote Score'' is determined based on the ratio of word frequency in quotations to its frequency outside quotations. The proportion of words in the English Wikipedia which are within quotations is found to be much smaller as compared to those which are not in quotes, resulting in a right-skewed distribution and low mean value of Quote Scores.
The methodology used to generate the subjectivity lexicon from text corpus in English Wikipedia is designed in such a way that it can be scaled and reused to produce similar subjectivity lexica of other languages. This is achieved by abstaining from domain and language-specific methods, apart from using only readily-available English dictionary packages to detect and exclude stopwords and non-English words in the Wikipedia text corpus.
The subjectivity lexicon generated from English Wikipedia is compared against other lexica; namely MPQA and SentiWordNet. It is found that words which are strongly subjective tend to have high Quote Scores in the subjectivity lexicon generated from English Wikipedia. There is a large observable difference between distribution of Quote Scores for words classified as strongly subjective versus distribution of Quote Scores for words classified as weakly subjective and objective. However, weakly subjective and objective words cannot be differentiated clearly based on Quote Score. In addition to that, a questionnaire is commissioned as an exploratory approach to investigate whether subjectivity lexicon generated from Wikipedia could be used to extend the coverage of words of existing lexica.
The availability of digital cameras and the possibility to take photos at no cost lead to an increasing amount of digital photos online and on private computers. The pure amount of data makes approaches that support users in the administration of the photo necessary. As the automatic understanding of photo content is still an unsolved task, metadata is needed for supporting administrative tasks like search or photo work such as the generation of photo books. Meta-information textually describes the depicted scene or consists of information on how good or interesting a photo is.
In this thesis, an approach for creating meta-information without additional effort for the user is investigated. Eye tracking data is used to measure the human visual attention. This attention is analyzed with the objective of information creation in the form of metadata. The gaze paths of users working with photos are recorded, for example, while they are searching for photos or while they are just viewing photo collections.
Eye tracking hardware is developing fast within the last years. Because of falling prices for sensor hardware such as cameras and more competition on the eye tracker market, the prices are falling, and the usability is increasing. It can be assumed that eye tracking technology can soon be used in everyday devices such as laptops or mobile phones. The exploitation of data, recorded in the background while the user is performing daily tasks with photos, has great potential to generate information without additional effort for the users.
The first part of this work deals with the labeling of image region by means of gaze data for describing the depicted scenes in detail. Labeling takes place by assigning object names to specific photo regions. In total, three experiments were conducted for investigating the quality of these assignments in different contexts. In the first experiment, users decided whether a given object can be seen on a photo by pressing a button. In the second study, participants searched for specific photos in an image search application. In the third experiment, gaze data was collected from users playing a game with the task to classify photos regarding given categories. The results of the experiments showed that gaze-based region labeling outperforms baseline approaches in various contexts. In the second part, most important photos in a collection of photos are identified by means of visual attention for the creation of individual photo selections. Users freely viewed photos of a collection without any specific instruction on what to fixate, while their gaze paths were recorded. By comparing gaze-based and baseline photo selections to manually created selections, the worth of eye tracking data in the identification of important photos is shown. In the analysis of the data, the characteristics of gaze data has to be considered, for example, inaccurate and ambiguous data. The aggregation of gaze data, collected from several users, is one suggested approach for dealing with this kind of data.
The results of the performed experiments show the value of gaze data as source of information. It allows to benefit from human abilities where algorithms still have problems to perform satisfyingly.
The content aggregator platform Reddit has established itself as one of the most popular websites in the world. However, scientific research on Reddit is hindered as Reddit allows (and even encourages) user anonymity, i.e., user profiles do not contain personal information such as the gender. Inferring the gender of users in large-scale could enable the analysis of gender-specific areas of interest, reactions to events, and behavioral patterns. In this direction, this thesis suggests a machine learning approach of estimating the gender of Reddit users. By exploiting specific conventions in parts of the website, we obtain a ground truth for more than 190 million comments of labeled users. This data is then used to train machine learning classifiers to use them to gain insights about the gender balance of particular subreddits and the platform in general. By comparing a variety of different approaches for classification algorithm, we find that character-level convolutional neural network achieves performance with an 82.3% F1 score on a task of predicting a gender of a user based on his/her comments. The score surpasses 85% mark for frequent users with more than 50 comments. Furthermore, we discover that female users are less active on Reddit platform, they write fewer comments and post in fewer subreddits on average, when compared to male users.
This habilitation thesis collects works addressing several challenges on handling uncertainty and inconsistency in knowledge representation. In particular, this thesis contains works which introduce quantitative uncertainty based on probability theory into abstract argumentation frameworks. The formal semantics of this extension is investigated and its application for strategic argumentation in agent dialogues is discussed. Moreover, both the computational as well as the meaningfulness of approaches to analyze inconsistencies, both in classical logics as well as logics for uncertain reasoning is investigated. Finally, this thesis addresses the implementation challenges for various kinds of knowledge representation formalisms employing any notion of inconsistency tolerance or uncertainty.
Current political issues are often reflected in social media discussions, gathering politicians and voters on common platforms. As these can affect the public perception of politics, the inner dynamics and backgrounds of such debates are of great scientific interest. This thesis takes user generated messages from an up-to-date dataset of considerable relevance as Time Series, and applies a topic-based analysis of inspiration and agenda setting to it. The Institute for Web Science and Technologies of the University Koblenz-Landau has collected Twitter data generated beforehand by candidates of the European Parliament Election 2019. This work processes and analyzes the dataset for various properties, while focusing on the influence of politicians and media on online debates. An algorithm to cluster tweets into topical threads is introduced. Subsequently, Sequential Association Rules are mined, yielding wide array of potential influence relations between both actors and topics. The elaborated methodology can be configured with different parameters and is extensible in functionality and scope of application.
The output of eye tracking Web usability studies can be visualized to the analysts as screenshots of the Web pages with their gaze data. However, the screenshot visualizations are found to be corrupted whenever there are recorded fixations on fixed Web page elements on different scroll positions. The gaze data are not gathered on their fixated fixed elements; rather they are scattered on their recorded scroll positions. This problem has raised our attention to find an approach to link gaze data to their intended fixed elements and gather them in one position on the screenshot. The approach builds upon the concept of creating the screenshot during the recording session, where images of the viewport are captured on visited scroll positions and lastly stitched into one Web page screenshot. Additionally, the fixed elements in the Web page are identified and linked to their fixations. For the evaluation, we compared the interpretation of our enhanced screenshot against the video visualization, which overcomes the problem. The results revealed that both visualizations equally deliver accurate interpretations. However, interpreting the visualizations of eye tracking Web usability studies using the enhanced screenshots outperforms the video visualizations in terms of speed and it requires less temporal demands from the interpreters.
This thesis focuses on approximate inference in assumption-based argumentation frameworks. Argumentation provides a significant idea in the computerization of theoretical and practical reasoning in AI. And it has a close connection with AI, engaging in arguments to perform scientific reasoning. The fundamental approach in this field is abstract argumentation frameworks developed by Dung. Assumption-based argumentation can be regarded as an instance of abstract argumentation with structured arguments. When facing a large scale of data, a challenge of reasoning in assumption-based argumentation is how to construct arguments and resolve attacks over a given claim with minimal cost of computation and acceptable accuracy at the same time. This thesis proposes and investigates approximate methods that randomly select and construct samples of frameworks based on graphical dispute derivations to solve this problem. The presented approach aims to improve reasoning performance and get an acceptable trade-off between computational time and accuracy. The evaluation shows that for reasoning in assumption-based argumentation, in general, the running time is reduced with the cost of slightly low accuracy by randomly sampling and constructing inference rules for potential arguments over a query.