Refine
Document Type
- Doctoral Thesis (5) (remove)
Keywords
- Data Mining (1)
- Formale Ontologie (1)
- Information Retrieval (1)
- Modellgetriebene Entwicklung (1)
- Software Engineering (1)
- Wissensmanagement (1)
- World Wide Web 2.0 (1)
- classification (1)
- data mining (1)
- description logic (1)
- folksonomies (1)
- information retrieval (1)
- landmarks (1)
- reasoning (1)
- semantics (1)
- tag recommendation (1)
- web 2.0 (1)
Institute
- Institut für Informatik (5) (remove)
One of the main goals of the artificial intelligence community is to create machines able to reason with dynamically changing knowledge. To achieve this goal, a multitude of different problems have to be solved, of which many have been addressed in the various sub-disciplines of artificial intelligence, like automated reasoning and machine learning. The thesis at hand focuses on the automated reasoning aspects of these problems and address two of the problems which have to be overcome to reach the afore-mentioned goal, namely 1. the fact that reasoning in logical knowledge bases is intractable and 2. the fact that applying changes to formalized knowledge can easily introduce inconsistencies, which leads to unwanted results in most scenarios.
To ease the intractability of logical reasoning, I suggest to adapt a technique called knowledge compilation, known from propositional logic, to description logic knowledge bases. The basic idea of this technique is to compile the given knowledge base into a normal form which allows to answer queries efficiently. This compilation step is very expensive but has to be performed only once and as soon as the result of this step is used to answer many queries, the expensive compilation step gets worthwhile. In the thesis at hand, I develop a normal form, called linkless normal form, suitable for knowledge compilation for description logic knowledge bases. From a computational point of view, the linkless normal form has very nice properties which are introduced in this thesis.
For the second problem, I focus on changes occurring on the instance level of description logic knowledge bases. I introduce three change operators interesting for these knowledge bases, namely deletion and insertion of assertions as well as repair of inconsistent instance bases. These change operators are defined such that in all three cases, the resulting knowledge base is ensured to be consistent and changes performed to the knowledge base are minimal. This allows us to preserve as much of the original knowledge base as possible. Furthermore, I show how these changes can be applied by using a transformation of the knowledge base.
For both issues I suggest to adapt techniques successfully used in other logics to get promising methods for description logic knowledge bases.
The semantic web and model-driven engineering are changing the enterprise computing paradigm. By introducing technologies like ontologies, metadata and logic, the semantic web improves drastically how companies manage knowledge. In counterpart, model-driven engineering relies on the principle of using models to provide abstraction, enabling developers to concentrate on the system functionality rather than on technical platforms. The next enterprise computing era will rely on the synergy between both technologies. On the one side, ontology technologies organize system knowledge in conceptual domains according to its meaning. It addresses enterprise computing needs by identifying, abstracting and rationalizing commonalities, and checking for inconsistencies across system specifications. On the other side, model-driven engineering is closing the gap among business requirements, designs and executables by using domain-specific languages with custom-built syntax and semantics. In this scenario, the research question that arises is: What are the scientific and technical results around ontology technologies that can be used in model-driven engineering and vice versa? The objective is to analyze approaches available in the literature that involve both ontologies and model-driven engineering. Therefore, we conduct a literature review that resulted in a feature model for classifying state-of-the-art approaches. The results show that the usage of ontologies and model-driven engineering together have multiple purposes: validation, visual notation, expressiveness and interoperability. While approaches involving both paradigms exist, an integrated approach for UML class-based modeling and ontology modeling is lacking so far. Therefore, we investigate the techniques and languages for designing integrated models. The objective is to provide an approach to support the design of integrated solutions. Thus, we develop a conceptual framework involving the structure and the notations of a solution to represent and query software artifacts using a combination of ontologies and class-based modeling. As proof of concept, we have implemented our approach as a set of open source plug-ins -- the TwoUse Toolkit. The hypothesis is that a combination of both paradigms yields improvements in both fields, ontology engineering and model-driven engineering. For MDE, we investigate the impact of using features of the Web Ontology Language in software modeling. The results are patterns and guidelines for designing ontology-based information systems and for supporting software engineers in modeling software. The results include alternative ways of describing classes and objects and querying software models and metamodels. Applications show improvements on changeability and extensibility. In the ontology engineering domain, we investigate the application of techniques used in model-driven engineering to fill the abstraction gap between ontology specification languages and programming languages. The objective is to provide a model-driven platform for supporting activities in the ontology engineering life cycle. Therefore, we study the development of core ontologies in our department, namely the core ontology for multimedia (COMM) and the multimedia metadata ontology. The results are domain-specific languages that allow ontology engineers to abstract from implementation issues and concentrate on the ontology engineering task. It results in increasing productivity by filling the gap between domain models and source code.
In international business relationships, such as international railway operations, large amounts of data can be exchanged among the parties involved. For the exchange of such data, a limited risk of being cheated by another party, e.g., by being provided with fake data, as well as reasonable cost and a foreseeable benefit, is expected. As the exchanged data can be used to make critical business decisions, there is a high incentive for one party to manipulate the data in its favor. To prevent this type of manipulation, mechanisms exist to ensure the integrity and authenticity of the data. In combination with a fair exchange protocol, it can be ensured that the integrity and authenticity of this data is maintained even when it is exchanged with another party. At the same time, such a protocol ensures that the exchange of data only takes place in conjunction with the agreed compensation, such as a payment, and that the payment is only made if the integrity and authenticity of the data is ensured as previously agreed. However, in order to be able to guarantee fairness, a fair exchange protocol must involve a trusted third party. To avoid fraud by a single centralized party acting as a trusted third party, current research proposes decentralizing the trusted third party, e.g., by using a distributed ledger based fair exchange protocol. However, for assessing the fairness of such an exchange, state-of-the-art approaches neglect costs arising for the parties conducting the fair exchange. This can result in a violation of the outlined expectation of reasonable cost, especially when distributed ledgers are involved, which are typically associated with non-negligible costs. Furthermore, the performance of typical distributed ledger-based fair exchange protocols is limited, posing an obstacle to widespread adoption.
To overcome the challenges, in this thesis, we introduce the foundation for a data exchange platform allowing for a fully decentralized fair data exchange with reasonable cost and performance. As a theoretical foundation, we introduce the concept of cost fairness, which considers cost for the fairness assessment by requesting that a party following the fair exchange protocol never suffers any unilateral disadvantages. We prove that cost fairness cannot be achieved using typical public distributed ledgers but requires customized distributed ledger instances, which usually lack complete decentralization. However, we show that the highest unilateral cost are caused by a grieving attack.
To allow fair data exchanges to be conducted with reasonable cost and performance, we introduce FairSCE, a distributed ledger-based fair exchange protocol using distributed ledger state channels and incorporating a mechanism to protect against grieving attacks, reducing the possible unilateral cost that have to be covered to a minimum. Based on our evaluation of FairSCE, the worst-case cost for data exchange, even in the presence of malicious parties, is known, which allows an estimate of the possible benefit and, thus, the preliminary estimate of economic utility. Furthermore, to allow for an unambiguous assessment of the correct data being transferred while still allowing for sensitive parts of the data to be masked, we introduce an approach for the hashing of hierarchically structured data, which can be used to ensure integrity and authenticity of the data being transferred.
The publication of freely available and machine-readable information has increased significantly in the last years. Especially the Linked Data initiative has been receiving a lot of attention. Linked Data is based on the Resource Description Framework (RDF) and anybody can simply publish their data in RDF and link it to other datasets. The structure is similar to the World Wide Web where individual HTML documents are connected with links. Linked Data entities are identified by URIs which are dereferenceable to retrieve information describing the entity. Additionally, so called SPARQL endpoints can be used to access the data with an algebraic query language (SPARQL) similar to SQL. By integrating multiple SPARQL endpoints it is possible to create a federation of distributed RDF data sources which acts like one big data store.
In contrast to the federation of classical relational database systems there are some differences for federated RDF data. RDF stores are accessed either via SPARQL endpoints or by resolving URIs. There is no coordination between RDF data sources and machine-readable meta data about a source- data is commonly limited or not available at all. Moreover, there is no common directory which can be used to discover RDF data sources or ask for sources which offer specific data. The federation of distributed and linked RDF data sources has to deal with various challenges. In order to distribute queries automatically, suitable data sources have to be selected based on query details and information that is available about the data sources. Furthermore, the minimization of query execution time requires optimization techniques that take into account the execution cost for query operators and the network communication overhead for contacting individual data sources. In this thesis, solutions for these problems are discussed. Moreover, SPLENDID is presented, a new federation infrastructure for distributed RDF data sources which uses optimization techniques based on statistical information.
Folksonomies are Web 2.0 platforms where users share resources with each other. Furthermore, they can assign keywords (called tags) to the resources for categorizing and organizing the resources. Numerous types of resources like websites (Delicious), images (Flickr), and videos (YouTube) are supported by different folksonomies. The folksonomies are easy to use and thus attract the attention of millions of users. Together with the ease they offer, there are also some problems. This thesis addresses different problems of folksonomies and proposes solutions for these problems. The first problem occurs when users search for relevant resources in folksonomies. Often, the users are not able to find all relevant resources because they don't know which tags are relevant. The second problem is assigning tags to resources. Although many folksonomies (like Delicious) recommend tags for the resources, other folksonomies (like Flickr) do not recommend any tags. Tag recommendation helps the users to easily tag their resources. The third problem is that tags and resources are lacking semantics. This leads for example to ambiguous tags. The tags are lacking semantics because they are freely chosen keywords. The automatic identification of the semantics of tags and resources helps in reducing problems that arise from this freedom of the users in choosing the tags. This thesis proposes methods which exploit semantics to address the problems of search, tag recommendation, and the identification of tag semantics. The semantics are discovered from a variety of sources. In this thesis, we exploit web search engines, online social communities and the co-occurrences of tags as sources of semantics. Using different sources for discovering semantics reduces the efforts to build systems which solve the problems mentioned earlier. This thesis evaluates the proposed methods on a large scale data set. The evaluation results suggest that it is possible to exploit the semantics for improving search, recommendation of tags, and automatic identification of the semantics of tags and resources.