Refine
Year of publication
- 2014 (5) (remove)
Document Type
- Doctoral Thesis (2)
- Part of Periodical (2)
- Bachelor Thesis (1)
Keywords
- Annotation (1)
- Augenbewegung (1)
- Auslese (1)
- Auswahl (1)
- Blickbewegung (1)
- Discussion Forums (1)
- Eyetracking (1)
- Fotoauswahl (1)
- Generative Model (1)
- Latent Negative (1)
Institute
- Institute for Web Science and Technologies (5) (remove)
The way information is presented to users in online community platforms has an influence on the way the users create new information. This is the case, for instance, in question-answering fora, crowdsourcing platforms or other social computation settings. To better understand the effects of presentation policies on user activity, we introduce a generative model of user behaviour in this paper. Running simulations based on this user behaviour we demonstrate the ability of the model to evoke macro phenomena comparable to the ones observed on real world data.
Modeling and publishing Linked Open Data (LOD) involves the choice of which vocabulary to use. This choice is far from trivial and poses a challenge to a Linked Data engineer. It covers the search for appropriate vocabulary terms, making decisions regarding the number of vocabularies to consider in the design process, as well as the way of selecting and combining vocabularies. Until today, there is no study that investigates the different strategies of reusing vocabularies for LOD modeling and publishing. In this paper, we present the results of a survey with 79 participants that examines the most preferred vocabulary reuse strategies of LOD modeling. Participants of our survey are LOD publishers and practitioners. Their task was to assess different vocabulary reuse strategies and explain their ranking decision. We found significant differences between the modeling strategies that range from reusing popular vocabularies, minimizing the number of vocabularies, and staying within one domain vocabulary. A very interesting insight is that the popularity in the meaning of how frequent a vocabulary is used in a data source is more important than how often individual classes and properties arernused in the LOD cloud. Overall, the results of this survey help in understanding the strategies how data engineers reuse vocabularies, and theyrnmay also be used to develop future vocabulary engineering tools.
Next word prediction is the task of suggesting the most probable word a user will type next. Current approaches are based on the empirical analysis of corpora (large text files) resulting in probability distributions over the different sequences that occur in the corpus. The resulting language models are then used for predicting the most likely next word. State-of-the-art language models are based on n-grams and use smoothing algorithms like modified Kneser-Ney smoothing in order to reduce the data sparsity by adjusting the probability distribution of unseen sequences. Previous research has shown that building word pairs with different distances by inserting wildcard words into the sequences can result in better predictions by further reducing data sparsity. The aim of this thesis is to formalize this novel approach and implement it by also including modified Kneser-Ney smoothing.
The availability of digital cameras and the possibility to take photos at no cost lead to an increasing amount of digital photos online and on private computers. The pure amount of data makes approaches that support users in the administration of the photo necessary. As the automatic understanding of photo content is still an unsolved task, metadata is needed for supporting administrative tasks like search or photo work such as the generation of photo books. Meta-information textually describes the depicted scene or consists of information on how good or interesting a photo is.
In this thesis, an approach for creating meta-information without additional effort for the user is investigated. Eye tracking data is used to measure the human visual attention. This attention is analyzed with the objective of information creation in the form of metadata. The gaze paths of users working with photos are recorded, for example, while they are searching for photos or while they are just viewing photo collections.
Eye tracking hardware is developing fast within the last years. Because of falling prices for sensor hardware such as cameras and more competition on the eye tracker market, the prices are falling, and the usability is increasing. It can be assumed that eye tracking technology can soon be used in everyday devices such as laptops or mobile phones. The exploitation of data, recorded in the background while the user is performing daily tasks with photos, has great potential to generate information without additional effort for the users.
The first part of this work deals with the labeling of image region by means of gaze data for describing the depicted scenes in detail. Labeling takes place by assigning object names to specific photo regions. In total, three experiments were conducted for investigating the quality of these assignments in different contexts. In the first experiment, users decided whether a given object can be seen on a photo by pressing a button. In the second study, participants searched for specific photos in an image search application. In the third experiment, gaze data was collected from users playing a game with the task to classify photos regarding given categories. The results of the experiments showed that gaze-based region labeling outperforms baseline approaches in various contexts. In the second part, most important photos in a collection of photos are identified by means of visual attention for the creation of individual photo selections. Users freely viewed photos of a collection without any specific instruction on what to fixate, while their gaze paths were recorded. By comparing gaze-based and baseline photo selections to manually created selections, the worth of eye tracking data in the identification of important photos is shown. In the analysis of the data, the characteristics of gaze data has to be considered, for example, inaccurate and ambiguous data. The aggregation of gaze data, collected from several users, is one suggested approach for dealing with this kind of data.
The results of the performed experiments show the value of gaze data as source of information. It allows to benefit from human abilities where algorithms still have problems to perform satisfyingly.
Through the increasing availability of access to the web, more and more interactions between people take place in online social networks, such as Twitter or Facebook, or sites where opinions can be exchanged. At the same time, knowledge is made openly available for many people, such as by the biggest collaborative encyclopedia Wikipedia and diverse information in Internet forums and on websites. These two kinds of networks - social networks and knowledge networks - are highly dynamic in the sense that the links that contain the important information about the relationships between people or the relations between knowledge items are frequently updated or changed. These changes follow particular structural patterns and characteristics that are far less random than expected.
The goal of this thesis is to predict three characteristic link patterns for the two network types of interest: the addition of new links, the removal of existing links and the presence of latent negative links. First, we show that the prediction of link removal is indeed a new and challenging problem. Even if the sociological literature suggests that reasons for the formation and resolution of ties are often complementary, we show that the two respective prediction problems are not. In particular, we show that the dynamics of new links and unlinks lead to the four link states of growth, decay, stability and instability. For knowledge networks we show that the prediction of link changes greatly benefits from the usage of temporal information; the timestamp of link creation and deletion events improves the prediction of future link changes. For that, we present and evaluate four temporal models that resemble different exploitation strategies. Focusing on directed social networks, we conceptualize and evaluate sociological constructs that explain the formation and dissolution of relationships between users. Measures based on information about past relationships are extremely valuable for predicting the dissolution of social ties. Hence, consistent for knowledge networks and social networks, temporal information in a network greatly improves the prediction quality. Turning again to social networks, we show that negative relationship information such as distrust or enmity can be predicted from positive known relationships in the network. This is particularly interesting in networks where users cannot label their relationships to other users as negative. For this scenario we show how latent negative relationships can be predicted.