Refine
Document Type
- Master's Thesis (7)
- Doctoral Thesis (6)
- Habilitation (1)
Language
- English (14) (remove)
Keywords
- 2019 European Parliament Election (1)
- Articles for Deletion (1)
- Association Rules (1)
- Enhanced Representation (1)
- Eye Tracking (1)
- Function Words (1)
- GazeTheWeb (1)
- Handsfree editing (1)
- I-messages (1)
- Latent Negative (1)
Institute
- Institute for Web Science and Technologies (14) (remove)
Knowledge-based authentication methods are vulnerable to Shoulder surfing phenomenon.
The widespread usage of these methods and not addressing the limitations it has could result in the user’s information to be compromised. User authentication method ought to be effortless to use and efficient, nevertheless secure.
The problem that we face concerning the security of PIN (Personal Identification Number) or password entry is shoulder surfing, in which a direct or indirect malicious observer could identify the user sensitive information. To tackle this issue we present TouchGaze which combines gaze signals and touch capabilities, as an input method for entering user’s credentials. Gaze signals will be primarily used to enhance targeting and touch for selecting. In this work, we have designed three different PIN entry method which they all have similar interfaces. For the evaluation, these methods were compared based on efficiency, accuracy, and usability. The results uncovered that despite the fact that gaze-based methods require extra time for the user to get familiar with yet it is considered more secure. In regards to efficiency, it has the similar error margin to the traditional PIN entry methods.
The Web contains some extremely valuable information; however, often poor quality, inaccurate, irrelevant or fraudulent information can also be found. With the increasing amount of data available, it is becoming more and more difficult to distinguish truth from speculation on the Web. One of the most, if not the most, important criterion used to evaluate data credibility is the information source, i.e., the data origin. Trust in the information source is a valuable currency users have to evaluate such data. Data popularity, recency (or the time of validity), reliability, or vagueness ascribed to the data may also help users to judge the validity and appropriateness of information sources. We call this knowledge derived from the data the provenance of the data. Provenance is an important aspect of the Web. It is essential in identifying the suitability, veracity, and reliability of information, and in deciding whether information is to be trusted, reused, or even integrated with other information sources. Therefore, models and frameworks for representing, managing, and using provenance in the realm of Semantic Web technologies and applications are critically required. This thesis highlights the benefits of the use of provenance in different Web applications and scenarios. In particular, it presents management frameworks for querying and reasoning in the Semantic Web with provenance, and presents a collection of Semantic Web tools that explore provenance information when ranking and updating caches of Web data. To begin, this thesis discusses a highly exible and generic approach to the treatment of provenance when querying RDF datasets. The approach re-uses existing RDF modeling possibilities in order to represent provenance. It extends SPARQL query processing in such a way that given a SPARQL query for data, one may request provenance without modifying it. The use of provenance within SPARQL queries helps users to understand how RDF facts arederived, i.e., it describes the data and the operations used to produce the derived facts. Turning to more expressive Semantic Web data models, an optimized algorithm for reasoning and debugging OWL ontologies with provenance is presented. Typical reasoning tasks over an expressive Description Logic (e.g., using tableau methods to perform consistency checking, instance checking, satisfiability checking, and so on) are in the worst case doubly exponential, and in practice are often likewise very expensive. With the algorithm described in this thesis, however, one can efficiently reason in OWL ontologies with provenance, i.e., provenance is efficiently combined and propagated within the reasoning process. Users can use the derived provenance information to judge the reliability of inferences and to find errors in the ontology. Next, this thesis tackles the problem of providing to Web users the right content at the right time. The challenge is to efficiently rank a stream of messages based on user preferences. Provenance is used to represent preferences, i.e., the user defines his preferences over the messages' popularity, recency, etc. This information is then aggregated to obtain a joint ranking. The aggregation problem is related to the problem of preference aggregation in Social Choice Theory. The traditional problem formulation of preference aggregation assumes a I fixed set of preference orders and a fixed set of domain elements (e.g. messages). This work, however, investigates how an aggregated preference order has to be updated when the domain is dynamic, i.e., the aggregation approach ranks messages 'on the y' as the message passes through the system. Consequently, this thesis presents computational approaches for online preference aggregation that handle the dynamic setting more efficiently than standard ones. Lastly, this thesis addresses the scenario of caching data from the Linked Open Data (LOD) cloud. Data on the LOD cloud changes frequently and applications relying on that data - by pre-fetching data from the Web and storing local copies of it in a cache - need to continually update their caches. In order to make best use of the resources (e.g., network bandwidth for fetching data, and computation time) available, it is vital to choose a good strategy to know when to fetch data from which data source. A strategy to cope with data changes is to check for provenance. Provenance information delivered by LOD sources can denote when the resource on the Web has been changed last. Linked Data applications can benefit from this piece of information since simply checking on it may help users decide which sources need to be updated. For this purpose, this work describes an investigation of the availability and reliability of provenance information in the Linked Data sources. Another strategy for capturing data changes is to exploit provenance in a time-dependent function. Such a function should measure the frequency of the changes of LOD sources. This work describes, therefore, an approach to the analysis of data dynamics, i.e., the analysis of the change behavior of Linked Data sources over time, followed by the investigation of different scheduling update strategies to keep local LOD caches up-to-date. This thesis aims to prove the importance and benefits of the use of provenance in different Web applications and scenarios. The exibility of the approaches presented, combined with their high scalability, make this thesis a possible building block for the Semantic Web proof layer cake - the layer of provenance knowledge.
Ontologies are valuable tools for knowledge representation and important building blocks of the Semantic Web. They are not static and can change over time. Changing an ontology can be necessary for various reasons: the domain that is represented by an ontology can change or an ontology is reused and must be adapted to the new context. In addition, modeling errors could have been introduced into the ontology which must be found and removed. The non-triviality of the change process has led to the emerge of ontology change as an own field of research. The removal of knowledge from ontologies is an important aspect of this change process, because even the addition of new knowledge to an ontology potentially requires the removal of older, conflicting knowledge. Such a removal must be performed in a thought-out way. A naïve change of concepts within the ontology can easily remove other, unrelated knowledge or alter the semantics of concepts in an unintended way [2]. For these reasons, this thesis introduces a formal operator for the fine-grained retraction of knowledge from EL concepts which is partially based on the postulates for belief set contraction and belief base contraction [3, 4, 5] and the work of Suchanek et al. [6]. For this, a short introduction to ontologies and OWL 2 is given and the problem of ontology change is explained. It is then argued why a formal operator can support this process and why the Description Logic EL provides a good starting point for the development of such an operator. After this, a general introduction to Description Logic is given. This includes its history, an overview of its applications and common reasoning tasks in this logic. Following this, the logic EL is defined. In a next step, related work is examined and it is shown why the recovery postulate and the relevance postulate cannot be naïvely employed in the development of an operator that removes knowledge from EL concepts. Following this, the requirements to the operator are formulated and properties are given which are mainly based on the postulates for belief set and belief base contraction. Additional properties are developed which make up for the non-applicability of the recovery and relevance postulates. After this, a formal definition of the operator is given and it is shown that the operator is applicable to the task of a fine-grained removal of knowledge from EL concepts. In a next step, it is proven that the operator fulfills all the previously defined properties. It is then demonstrated how the operator can be combined with laconic justifications [7] to assist a human ontology editor by automatically removing unwanted consequences from an ontology. Building on this, a plugin for the ontology editor Protégé is introduced that is based on algorithms that were derived from the formal definition of the operator. The content of this work is then summarized and a final conclusion is drawn. The thesis closes with an outlook into possible future work.
The distributed setting of RDF stores in the cloud poses many challenges. One such challenge is how the data placement on the compute nodes can be optimized to improve the query performance. To address this challenge, several evaluations in the literature have investigated the effects of existing data placement strategies on the query performance. A common drawback in theses evaluations is that it is unclear whether the observed behaviors were caused by the data placement strategies (if different RDF stores were evaluated as a whole) or reflect the behavior in distributed RDF stores (if cloud processing frameworks like Hadoop MapReduce are used for the evaluation). To overcome these limitations, this thesis develops a novel benchmarking methodology for data placement strategies that uses a data-placement-strategy-independent distributed RDF store to analyze the effect of the data placement strategies on query performance.
With this evaluation methodology the frequently used data placement strategies have been evaluated. This evaluation challenged the commonly held belief that data placement strategies that emphasize local computation, such as minimal edge-cut cover, lead to faster query executions. The results indicate that queries with a high workload may be executed faster on hash-based data placement strategies than on, e.g., minimal edge-cut covers. The analysis of the additional measurements indicates that vertical parallelization (i.e., a well-distributed workload) may be more important than horizontal containment (i.e., minimal data transport) for efficient query processing.
Moreover, to find a data placement strategy with a high vertical parallelization, the thesis tests the hypothesis that collocating small connected triple sets on the same compute node while balancing the amount of triples stored on the different compute nodes leads to a high vertical parallelization. Specifically, the thesis proposes two such data placement strategies. The first strategy called overpartitioned minimal edge-cut cover was found in the literature and the second strategy is the newly developed molecule hash cover. The evaluation revealed a balanced query workload and a high horizontal containment, which lead to a high vertical parallelization. As a result these strategies showed a better query performance than the frequently used data placement strategies.
This thesis presents novel approaches for integrating context information into probabilistic models. Data from social media is typically associated with metadata, which includes context information such as timestamps, geographical coordinates or links to user profiles. Previous studies showed the benefits of using such context information in probabilistic models, e.g.\ improved predictive performance. In practice, probabilistic models which account for context information still play a minor role in data analysis. There are multiple reasons for this. Existing probabilistic models often are complex, the implementation is difficult, implementations are not publicly available, or the parameter estimation is computationally too expensive for large datasets. Additionally, existing models are typically created for a specific type of content and context and lack the flexibility to be applied to other data.
This thesis addresses these problems by introducing a general approach for modelling multiple, arbitrary context variables in probabilistic models and by providing efficient inference schemes and implementations.
In the first half of this thesis, the importance of context and the potential of context information for probabilistic modelling is shown theoretically and in practical examples. In the second half, the example of topic models is employed for introducing a novel approach to context modelling based on document clusters and adjacency relations in the context space. They can cope with areas of sparse observations and These models allow for the first time the efficient, explicit modelling of arbitrary context variables including cyclic and spherical context (such as temporal cycles or geographical coordinates). Using the novel three-level hierarchical multi-Dirichlet process presented in this thesis, the adjacency of ontext clusters can be exploited and multiple contexts can be modelled and weighted at the same time. Efficient inference schemes are derived which yield interpretable model parameters that allow analyse the relation between observations and context.
The Web is an essential component of moving our society to the digital age. We use it for communication, shopping, and doing our work. Most user interaction in the Web happens with Web page interfaces. Thus, the usability and accessibility of Web page interfaces are relevant areas of research to make the Web more useful. Eye tracking is a tool that can be helpful in both areas, performing usability testing and improving accessibility. It can be used to understand users' attention on Web pages and to support usability experts in their decision-making process. Moreover, eye tracking can be used as an input method to control an interface. This is especially useful for people with motor impairment, who cannot use traditional input devices like mouse and keyboard. However, interfaces on Web pages become more and more complex due to dynamics, i.e., changing contents like animated menus and photo carousels. We need general approaches to comprehend dynamics on Web pages, allowing for efficient usability analysis and enjoyable interaction with eye tracking. In the first part of this thesis, we report our work on improving gaze-based analysis of dynamic Web pages. Eye tracking can be used to collect the gaze signals of users, who browse a Web site and its pages. The gaze signals show a usability expert what parts in the Web page interface have been read, glanced at, or skipped. The aggregation of gaze signals allows a usability expert insight into the users' attention on a high-level, before looking into individual behavior. For this, all gaze signals must be aligned to the interface as experienced by the users. However, the user experience is heavily influenced by changing contents, as these may cover a substantial portion of the screen. We delineate unique states in Web page interfaces including changing contents, such that gaze signals from multiple users can be aggregated correctly. In the second part of this thesis, we report our work on improving the gaze-based interaction with dynamic Web pages. Eye tracking can be used to retrieve gaze signals while a user operates a computer. The gaze signals may be interpreted as input controlling an interface. Nowadays, eye tracking as an input method is mostly used to emulate mouse and keyboard functionality, hindering an enjoyable user experience. There exist a few Web browser prototypes that directly interpret gaze signals for control, but they do not work on dynamic Web pages. We have developed a method to extract interaction elements like hyperlinks and text inputs efficiently on Web pages, including changing contents. We adapt the interaction with those elements for eye tracking as the input method, such that a user can conveniently browse the Web hands-free. Both parts of this thesis conclude with user-centered evaluations of our methods, assessing the improvements in the user experience for usability experts and people with motor impairment, respectively.
Commonsense reasoning can be seen as a process of identifying dependencies amongst events and actions. Understanding the circumstances surrounding these events requires background knowledge with sufficient breadth to cover a wide variety of domains. In the recent decades, there has been a lot of work in extracting commonsense knowledge, a number of these projects provide their collected data as semantic networks such as ConceptNet and CausalNet. In this thesis, we attempt to undertake the Choice Of Plausible Alternatives (COPA) challenge, a problem set with 1000 questions written in multiple-choice format with a premise and two alternative choices for each question. Our approach differs from previous work by using shortest paths between concepts in a causal graph with the edge weight as causality metric. We use CausalNet as primary network and implement a few design choices to explore the strengths and drawbacks of this approach, and propose an extension using ConceptNet by leveraging its commonsense knowledge base.
Through the increasing availability of access to the web, more and more interactions between people take place in online social networks, such as Twitter or Facebook, or sites where opinions can be exchanged. At the same time, knowledge is made openly available for many people, such as by the biggest collaborative encyclopedia Wikipedia and diverse information in Internet forums and on websites. These two kinds of networks - social networks and knowledge networks - are highly dynamic in the sense that the links that contain the important information about the relationships between people or the relations between knowledge items are frequently updated or changed. These changes follow particular structural patterns and characteristics that are far less random than expected.
The goal of this thesis is to predict three characteristic link patterns for the two network types of interest: the addition of new links, the removal of existing links and the presence of latent negative links. First, we show that the prediction of link removal is indeed a new and challenging problem. Even if the sociological literature suggests that reasons for the formation and resolution of ties are often complementary, we show that the two respective prediction problems are not. In particular, we show that the dynamics of new links and unlinks lead to the four link states of growth, decay, stability and instability. For knowledge networks we show that the prediction of link changes greatly benefits from the usage of temporal information; the timestamp of link creation and deletion events improves the prediction of future link changes. For that, we present and evaluate four temporal models that resemble different exploitation strategies. Focusing on directed social networks, we conceptualize and evaluate sociological constructs that explain the formation and dissolution of relationships between users. Measures based on information about past relationships are extremely valuable for predicting the dissolution of social ties. Hence, consistent for knowledge networks and social networks, temporal information in a network greatly improves the prediction quality. Turning again to social networks, we show that negative relationship information such as distrust or enmity can be predicted from positive known relationships in the network. This is particularly interesting in networks where users cannot label their relationships to other users as negative. For this scenario we show how latent negative relationships can be predicted.
“Did I say something wrong?” A word-level analysis of Wikipedia articles for deletion discussions
(2016)
This thesis focuses on gaining linguistic insights into textual discussions on a word level. It was of special interest to distinguish messages that constructively contribute to a discussion from those that are detrimental to them. Thereby, we wanted to determine whether “I”- and “You”-messages are indicators for either of the two discussion styles. These messages are nowadays often used in guidelines for successful communication. Although their effects have been successfully evaluated multiple times, a large-scale analysis has never been conducted. Thus, we used Wikipedia Articles for Deletion (short: AfD) discussions together with the records of blocked users and developed a fully automated creation of an annotated data set. In this data set, messages were labelled either constructive or disruptive. We applied binary classifiers to the data to determine characteristic words for both discussion styles. Thereby, we also investigated whether function words like pronouns and conjunctions play an important role in distinguishing the two. We found that “You”-messages were a strong indicator for disruptive messages which matches their attributed effects on communication. However, we found “I”-messages to be indicative for disruptive messages as well which is contrary to their attributed effects. The importance of function words could neither be confirmed nor refuted. Other characteristic words for either communication style were not found. Yet, the results suggest that a different model might represent disruptive and constructive messages in textual discussions better.