Refine
Keywords
- Latent Negative (1)
- Link Prediction (1)
- Maschinelles Lernen (1)
- Unlink Prediction (1)
Institute
- Institute for Web Science and Technologies (2) (remove)
Through the increasing availability of access to the web, more and more interactions between people take place in online social networks, such as Twitter or Facebook, or sites where opinions can be exchanged. At the same time, knowledge is made openly available for many people, such as by the biggest collaborative encyclopedia Wikipedia and diverse information in Internet forums and on websites. These two kinds of networks - social networks and knowledge networks - are highly dynamic in the sense that the links that contain the important information about the relationships between people or the relations between knowledge items are frequently updated or changed. These changes follow particular structural patterns and characteristics that are far less random than expected.
The goal of this thesis is to predict three characteristic link patterns for the two network types of interest: the addition of new links, the removal of existing links and the presence of latent negative links. First, we show that the prediction of link removal is indeed a new and challenging problem. Even if the sociological literature suggests that reasons for the formation and resolution of ties are often complementary, we show that the two respective prediction problems are not. In particular, we show that the dynamics of new links and unlinks lead to the four link states of growth, decay, stability and instability. For knowledge networks we show that the prediction of link changes greatly benefits from the usage of temporal information; the timestamp of link creation and deletion events improves the prediction of future link changes. For that, we present and evaluate four temporal models that resemble different exploitation strategies. Focusing on directed social networks, we conceptualize and evaluate sociological constructs that explain the formation and dissolution of relationships between users. Measures based on information about past relationships are extremely valuable for predicting the dissolution of social ties. Hence, consistent for knowledge networks and social networks, temporal information in a network greatly improves the prediction quality. Turning again to social networks, we show that negative relationship information such as distrust or enmity can be predicted from positive known relationships in the network. This is particularly interesting in networks where users cannot label their relationships to other users as negative. For this scenario we show how latent negative relationships can be predicted.
This thesis presents novel approaches for integrating context information into probabilistic models. Data from social media is typically associated with metadata, which includes context information such as timestamps, geographical coordinates or links to user profiles. Previous studies showed the benefits of using such context information in probabilistic models, e.g.\ improved predictive performance. In practice, probabilistic models which account for context information still play a minor role in data analysis. There are multiple reasons for this. Existing probabilistic models often are complex, the implementation is difficult, implementations are not publicly available, or the parameter estimation is computationally too expensive for large datasets. Additionally, existing models are typically created for a specific type of content and context and lack the flexibility to be applied to other data.
This thesis addresses these problems by introducing a general approach for modelling multiple, arbitrary context variables in probabilistic models and by providing efficient inference schemes and implementations.
In the first half of this thesis, the importance of context and the potential of context information for probabilistic modelling is shown theoretically and in practical examples. In the second half, the example of topic models is employed for introducing a novel approach to context modelling based on document clusters and adjacency relations in the context space. They can cope with areas of sparse observations and These models allow for the first time the efficient, explicit modelling of arbitrary context variables including cyclic and spherical context (such as temporal cycles or geographical coordinates). Using the novel three-level hierarchical multi-Dirichlet process presented in this thesis, the adjacency of ontext clusters can be exploited and multiple contexts can be modelled and weighted at the same time. Efficient inference schemes are derived which yield interpretable model parameters that allow analyse the relation between observations and context.