OPUS 4 | Search

Discovering and exploiting semantics in folksonomies (2011)

Abbasi, Rabeeh

Folksonomies are Web 2.0 platforms where users share resources with each other. Furthermore, they can assign keywords (called tags) to the resources for categorizing and organizing the resources. Numerous types of resources like websites (Delicious), images (Flickr), and videos (YouTube) are supported by different folksonomies. The folksonomies are easy to use and thus attract the attention of millions of users. Together with the ease they offer, there are also some problems. This thesis addresses different problems of folksonomies and proposes solutions for these problems. The first problem occurs when users search for relevant resources in folksonomies. Often, the users are not able to find all relevant resources because they don't know which tags are relevant. The second problem is assigning tags to resources. Although many folksonomies (like Delicious) recommend tags for the resources, other folksonomies (like Flickr) do not recommend any tags. Tag recommendation helps the users to easily tag their resources. The third problem is that tags and resources are lacking semantics. This leads for example to ambiguous tags. The tags are lacking semantics because they are freely chosen keywords. The automatic identification of the semantics of tags and resources helps in reducing problems that arise from this freedom of the users in choosing the tags. This thesis proposes methods which exploit semantics to address the problems of search, tag recommendation, and the identification of tag semantics. The semantics are discovered from a variety of sources. In this thesis, we exploit web search engines, online social communities and the co-occurrences of tags as sources of semantics. Using different sources for discovering semantics reduces the efforts to build systems which solve the problems mentioned earlier. This thesis evaluates the proposed methods on a large scale data set. The evaluation results suggest that it is possible to exploit the semantics for improving search, recommendation of tags, and automatic identification of the semantics of tags and resources.

Emotion and Sentiment Detection in Unstructured Social Data (2022)

AlGhalibi, Maha

Social media provides a powerful way for people to share opinions and sentiments about a specific topic, allowing others to benefit from these thoughts and feelings. This procedure generates a huge amount of unstructured data, such as texts, images, and references that are constantly increasing through daily comments to related discussions. However, the vast amount of unstructured data presents risks to the information-extraction process, and so decision making becomes highly challenging. This is because data overload may cause the loss of useful data due to its inappropriate presentation and its accumulation. To this extent, this thesis contributed to the field of analyzing and detecting feelings in images and texts. And that by extracting the feelings and opinions hidden in a huge collection of image data and texts on social networks After that, these feelings are classified into positive, negative, or neutral, according to the features of the classified data. The process of extracting these feelings greatly helps in decision-making processes on various topics as will be explained in the first chapter of the thesis. A system has been built that can classify the feelings inherent in the images and texts on social media sites, such as people’s opinions about products and companies, personal posts, and general messages. This thesis begins by introducing a new method of reducing the dimension of text data based on data-mining approaches and then examines the sentiment based on neural and deep neural network classification algorithms. Subsequently, in contrast to sentiment analysis research in text datasets, we examine sentiment expression and polarity classification within and across image datasets by building deep neural networks based on the attention mechanism.

A methodology for secure interactive systems (2008)

Beuster, Gerd

This dissertation introduces a methodology for formal specification and verification of user interfaces under security aspects. The methodology allows to use formal methods pervasively in the specification and verification of human-computer interaction. This work consists of three parts. In the first part, a formal methodology for the description of human-computer interaction is developed. In the second part, existing definitions of computer security are adapted for human-computer interaction and formalized. A generic formal model of human-computer interaction is developed. In the third part, the methodology is applied to the specification and verification of a secure email client.

Proactive Content Placement in Information-Centric Connected Vehicle Environments (2021)

Grewe, Dennis

Connected vehicles will have a tremendous impact on tomorrow’s mobility solutions. Such systems will heavily rely on information delivery in time to ensure the functional reliability, security and safety. However, the host-centric communication model of today’s networks questions efficient data dissemination in a scale, especially in networks characterized by a high degree of mobility. The Information-Centric Networking (ICN) paradigm has evolved as a promising candidate for the next generation of network architectures. Based on a loosely coupled communication model, the in-network processing and caching capabilities of ICNs are promising to solve the challenges set by connected vehicular systems. In such networks, a special class of caching strategies which take action by placing a consumer’s anticipated content actively at the right network nodes in time are promising to reduce the data delivery time. This thesis contributes to the research in active placement strategies in information-centric and computation-centric vehicle networks for providing dynamic access to content and computation results. By analyzing different vehicular applications and their requirements, novel caching strategies are developed in order to reduce the time of content retrieval. The caching strategies are compared and evaluated against the state-of-the-art in both extensive simulations as well as real world deployments. The results are showing performance improvements by increasing the content retrieval (availability of specific data increased up to 35% compared to state-of-the-art caching strategies), and reducing the delivery times (roughly double the number of data retrieval from neighboring nodes). However, storing content actively in connected vehicle networks raises questions regarding security and privacy. In the second part of the thesis, an access control framework for information-centric connected vehicles is presented. Finally, open security issues and research directions in executing computations at the edge of connected vehicle networks are presented.

Distributed Query Processing for Federated RDF Data Management (2015)

Görlitz, Olaf

The publication of freely available and machine-readable information has increased significantly in the last years. Especially the Linked Data initiative has been receiving a lot of attention. Linked Data is based on the Resource Description Framework (RDF) and anybody can simply publish their data in RDF and link it to other datasets. The structure is similar to the World Wide Web where individual HTML documents are connected with links. Linked Data entities are identified by URIs which are dereferenceable to retrieve information describing the entity. Additionally, so called SPARQL endpoints can be used to access the data with an algebraic query language (SPARQL) similar to SQL. By integrating multiple SPARQL endpoints it is possible to create a federation of distributed RDF data sources which acts like one big data store. In contrast to the federation of classical relational database systems there are some differences for federated RDF data. RDF stores are accessed either via SPARQL endpoints or by resolving URIs. There is no coordination between RDF data sources and machine-readable meta data about a source- data is commonly limited or not available at all. Moreover, there is no common directory which can be used to discover RDF data sources or ask for sources which offer specific data. The federation of distributed and linked RDF data sources has to deal with various challenges. In order to distribute queries automatically, suitable data sources have to be selected based on query details and information that is available about the data sources. Furthermore, the minimization of query execution time requires optimization techniques that take into account the execution cost for query operators and the network communication overhead for contacting individual data sources. In this thesis, solutions for these problems are discussed. Moreover, SPLENDID is presented, a new federation infrastructure for distributed RDF data sources which uses optimization techniques based on statistical information.

Knowledge engineering for software languages and software technologies (2022)

Heinz, Marcel

For software engineers, conceptually understanding the tools they are using in the context of their projects is a daily challenge and a prerequisite for complex tasks. Textual explanations and code examples serve as knowledge resources for understanding software languages and software technologies. This thesis describes research on integrating and interconnecting existing knowledge resources, which can then be used to assist with understanding and comparing software languages and software technologies on a conceptual level. We consider the following broad research questions that we later refine: What knowledge resources can be systematically reused for recovering structured knowledge and how? What vocabulary already exists in literature that is used to express conceptual knowledge? How can we reuse the online encyclopedia Wikipedia? How can we detect and report on instances of technology usage? How can we assure reproducibility as the central quality factor of any construction process for knowledge artifacts? As qualitative research, we describe methodologies to recover knowledge resources by i.) systematically studying literature, ii.) mining Wikipedia, iii.) mining available textual explanations and code examples of technology usage. The theoretical findings are backed by case studies. As research contributions, we have recovered i.) a reference semantics of vocabulary for describing software technology usage with an emphasis on software languages, ii.) an annotated corpus of Wikipedia articles on software languages, iii.) insights into technology usage on GitHub with regard to a catalog of pattern and iv.) megamodels of technology usage that are interconnected with existing textual explanations and code examples.

Technical and Methodological Improvements to Mining Software Repositories (2024)

Härtel, Johannes

Empirical studies in software engineering use software repositories as data sources to understand software development. Repository data is either used to answer questions that guide the decision-making in the software development, or to provide tools that help with practical aspects of developers’ everyday work. Studies are classified into the field of Empirical Software Engineering (ESE), and more specifically into Mining Software Repositories (MSR). Studies working with repository data often focus on their results. Results are statements or tools, derived from the data, that help with practical aspects of software development. This thesis focuses on the methods and high order methods used to produce such results. In particular, we focus on incremental methods to scale the processing of repositories, declarative methods to compose a heterogeneous analysis, and high order methods used to reason about threats to methods operating on repositories. We summarize this as technical and methodological improvements. We contribute the improvements to methods and high-order methods in the context of MSR/ESE to produce future empirical results more effectively. We contribute the following improvements. We propose a method to improve the scalability of functions that abstract over repositories with high revision count in a theoretically founded way. We use insights on abstract algebra and program incrementalization to define a core interface of highorder functions that compute scalable static abstractions of a repository with many revisions. We evaluate the scalability of our method by benchmarks, comparing a prototype with available competitors in MSR/ESE. We propose a method to improve the definition of functions that abstract over a repository with a heterogeneous technology stack, by using concepts from declarative logic programming and combining them with ideas on megamodeling and linguistic architecture. We reproduce existing ideas on declarative logic programming with languages close to Datalog, coming from architecture recovery, source code querying, and static program analysis, and transfer them from the analysis of a homogeneous to a heterogeneous technology stack. We provide a prove-of-concept of such method in a case study. We propose a high-order method to improve the disambiguation of threats to methods used in MSR/ESE. We focus on a better disambiguation of threats, operationalizing reasoning about them, and making the implications to a valid data analysis methodology explicit, by using simulations. We encourage researchers to accomplish their work by implementing ‘fake’ simulations of their MSR/ESE scenarios, to operationalize relevant insights about alternative plausible results, negative results, potential threats and the used data analysis methodologies. We prove that such way of simulation based testing contributes to the disambiguation of threats in published MSR/ESE research.

Markov random field terrain classification for autonomous robots in unstructured terrain (2015)

Häselich, Marcel

This thesis addresses the problem of terrain classification in unstructured outdoor environments. Terrain classification includes the detection of obstacles and passable areas as well as the analysis of ground surfaces. A 3D laser range finder is used as primary sensor for perceiving the surroundings of the robot. First of all, a grid structure is introduced for data reduction. The chosen data representation allows for multi-sensor integration, e.g., cameras for color and texture information or further laser range finders for improved data density. Subsequently, features are computed for each terrain cell within the grid. Classification is performedrnwith a Markov random field for context-sensitivity and to compensate for sensor noise and varying data density within the grid. A Gibbs sampler is used for optimization and is parallelized on the CPU and GPU in order to achieve real-time performance. Dynamic obstacles are detected and tracked using different state-of-the-art approaches. The resulting information - where other traffic participants move and are going to move to - is used to perform inference in regions where the terrain surface is partially or completely invisible for the sensors. Algorithms are tested and validated on different autonomous robot platforms and the evaluation is carried out with human-annotated ground truth maps of millions of measurements. The terrain classification approach of this thesis proved reliable in all real-time scenarios and domains and yielded new insights. Furthermore, if combined with a path planning algorithm, it enables full autonomy for all kinds of wheeled outdoor robots in natural outdoor environments.

Practices, Networks and Success in Creative Careers: Study of Inequalities using Large-scale Digital Behavioural Data (2023)

Jadidi, Mohsen

In the last decade, policy-makers around the world have turned their attention toward the creative industry as the economic engine and significant driver of employments. Yet, the literature suggests that creative workers are one of the most vulnerable work-forces of today’s economy. Because of the highly deregulated and highly individuated environment, failure or success are believed to be the byproduct of individual ability and commitment, rather than a structural or collective issue. This thesis taps into the temporal, spatial, and social resolution of digital behavioural data to show that there are indeed structural and historical issues that impact individuals’ and groups’ careers. To this end, this thesis offers a computational social science research framework that brings together the decades-long theoretical and empirical knowledge of inequality studies, and computational methods that deal with the complexity and scale of digital data. By taking music industry and science as use cases, this thesis starts off by proposing a novel gender detection method that exploits image search and face-detection methods. By analysing the collaboration patterns and citation networks of male and female computer scientists, it sheds lights on some of the historical biases and disadvantages that women face in their scientific career. In particular, the relation of scientific success and gender-specific collaboration patterns is assessed. To elaborate further on the temporal aspect of inequalities in scientific careers, this thesis compares the degree of vertical and horizontal inequalities among the cohorts of scientists that started their career at different point in time. Furthermore, the structural inequality in music industry is assessed by analyzing the social and cultural relations that breed from live performances and musics releases. The findings hint toward the importance of community belonging at different stages of artists’ careers. This thesis also quantifies some of the underlying mechanisms and processes of inequality, such as the Matthew Effect and the Hipster Paradox, in creative careers. Finally, this thesis argues that online platforms such as Wikipedia could reflect and amplify the existing biases.

Secure semantic web data management (2016)

Kasten, Andreas

Confidentiality, integrity, and availability are often listed as the three major requirements for achieving data security and are collectively referred to as the C-I-A triad. Confidentiality of data restricts the data access to authorized parties only, integrity means that the data can only be modified by authorized parties, and availability states that the data must always be accessible when requested. Although these requirements are relevant for any computer system, they are especially important in open and distributed networks. Such networks are able to store large amounts of data without having a single entity in control of ensuring the data's security. The Semantic Web applies to these characteristics as well as it aims at creating a global and decentralized network of machine-readable data. Ensuring the confidentiality, integrity, and availability of this data is therefore also important and must be achieved by corresponding security mechanisms. However, the current reference architecture of the Semantic Web does not define any particular security mechanism yet which implements these requirements. Instead, it only contains a rather abstract representation of security. This thesis fills this gap by introducing three different security mechanisms for each of the identified security requirements confidentiality, integrity, and availability of Semantic Web data. The mechanisms are not restricted to the very basics of implementing each of the requirements and provide additional features as well. Confidentiality is usually achieved with data encryption. This thesis not only provides an approach for encrypting Semantic Web data, it also allows to search in the resulting ciphertext data without decrypting it first. Integrity of data is typically implemented with digital signatures. Instead of defining a single signature algorithm, this thesis defines a formal framework for signing arbitrary Semantic Web graphs which can be configured with various algorithms to achieve different features. Availability is generally supported by redundant data storage. This thesis expands the classical definition of availability to compliant availability which means that data must only be available as long as the access request complies with a set of predefined policies. This requirement is implemented with a modular and extensible policy language for regulating information flow control. This thesis presents each of these three security mechanisms in detail, evaluates them against a set of requirements, and compares them with the state of the art and related work.

Extending the reach and power of deductive program verification (2009)

Klebanov, Vladimir

Software is vital for modern society. The efficient development of correct and reliable software is of ever-growing importance. An important technique to achieve this goal is deductive program verification: the construction of logical proofs that programs are correct. In this thesis, we address three important challenges for deductive verification on its way to a wider deployment in the industry: 1. verification of thread-based concurrent programs 2. correctness management of verification systems 3. change management in the verification process. These are consistently brought up by practitioners when applying otherwise mature verification systems. The three challenges correspond to the three parts of this thesis (not counting the introductory first part, providing technical background on the KeY verification approach). In the first part, we define a novel program logic for specifying correctness properties of object-oriented programs with unbounded thread-based concurrency. We also present a calculus for the above logic, which allows verifying actual Java programs. The calculus is based on symbolic execution resulting in its good understandability for the user. We describe the implementation of the calculus in the KeY verification system and present a case study. In the second part, we provide a first systematic survey and appraisal of factors involved in reliability of formal reasoning. We elucidate the potential and limitations of self-application of formal methods in this area and give recommendations based on our experience in design and operation of verification systems. In the third part, we show how the technique of similarity-based proof reuse can be applied to the problems of industrial verification life cycle. We address issues (e.g., coping with changes in the proof system) that are important in verification practice, but have been neglected by research so far.

Latency Reduction for Real-Time Rendering and its Application to VR Training Scenarios (2021)

Lochmann, Gerrit

Virtual reality is a growing field of interest as it provides a particular intuitive way of user-interaction. However, there are still open technical issues regarding latency — the delay between interaction and display reaction — and the trade-off between visual quality and frame-rate of real-time graphics, especially when taking visual effects like specular and semi-transparent surfaces and volumes into account. One solution, a distributed rendering setup, is presented in this thesis, in which the image synthesis is divided into an accurate but costly physically based rendering thread with a low refresh rate and a fast reprojection thread to remain a responsive interactivity with a high frame-rate. Two novel reprojection techniques are proposed that cover reflections and refractions produced by surface ray-tracing as well as volumetric light transport generated by volume ray-marching. The introduced setup can enhance the VR experience within several domains. In this thesis, three innovative training applications have been realized to investigate the added value of virtual reality to the three learning stages of observation, interaction and collaboration. For each stage an interdisciplinary curriculum, currently taught with traditional media, was transferred to a VR setting in order to investigate how virtual reality is capable of providing a natural, flexible and efficient learning environment

Decentralized Fair Data Exchange with Minimal Mutual Trust using Distributed Ledgers (2024)

Lohr, Matthias

In international business relationships, such as international railway operations, large amounts of data can be exchanged among the parties involved. For the exchange of such data, a limited risk of being cheated by another party, e.g., by being provided with fake data, as well as reasonable cost and a foreseeable benefit, is expected. As the exchanged data can be used to make critical business decisions, there is a high incentive for one party to manipulate the data in its favor. To prevent this type of manipulation, mechanisms exist to ensure the integrity and authenticity of the data. In combination with a fair exchange protocol, it can be ensured that the integrity and authenticity of this data is maintained even when it is exchanged with another party. At the same time, such a protocol ensures that the exchange of data only takes place in conjunction with the agreed compensation, such as a payment, and that the payment is only made if the integrity and authenticity of the data is ensured as previously agreed. However, in order to be able to guarantee fairness, a fair exchange protocol must involve a trusted third party. To avoid fraud by a single centralized party acting as a trusted third party, current research proposes decentralizing the trusted third party, e.g., by using a distributed ledger based fair exchange protocol. However, for assessing the fairness of such an exchange, state-of-the-art approaches neglect costs arising for the parties conducting the fair exchange. This can result in a violation of the outlined expectation of reasonable cost, especially when distributed ledgers are involved, which are typically associated with non-negligible costs. Furthermore, the performance of typical distributed ledger-based fair exchange protocols is limited, posing an obstacle to widespread adoption. To overcome the challenges, in this thesis, we introduce the foundation for a data exchange platform allowing for a fully decentralized fair data exchange with reasonable cost and performance. As a theoretical foundation, we introduce the concept of cost fairness, which considers cost for the fairness assessment by requesting that a party following the fair exchange protocol never suffers any unilateral disadvantages. We prove that cost fairness cannot be achieved using typical public distributed ledgers but requires customized distributed ledger instances, which usually lack complete decentralization. However, we show that the highest unilateral cost are caused by a grieving attack. To allow fair data exchanges to be conducted with reasonable cost and performance, we introduce FairSCE, a distributed ledger-based fair exchange protocol using distributed ledger state channels and incorporating a mechanism to protect against grieving attacks, reducing the possible unilateral cost that have to be covered to a minimum. Based on our evaluation of FairSCE, the worst-case cost for data exchange, even in the presence of malicious parties, is known, which allows an estimate of the possible benefit and, thus, the preliminary estimate of economic utility. Furthermore, to allow for an unambiguous assessment of the correct data being transferred while still allowing for sensitive parts of the data to be masked, we introduce an approach for the hashing of hierarchically structured data, which can be used to ensure integrity and authenticity of the data being transferred.

Wireless communication on the factory floor supporting agile production (2023)

Lyczkowski, Eike

The trends of industry 4.0 and the further enhancements toward an ever changing factory lead to more mobility and flexibility on the factory floor. With that higher need of mobility and flexibility the requirements on wireless communication rise. A key requirement in that setting is the demand for wireless Ultra-Reliability and Low Latency Communication (URLLC). Example use cases therefore are cooperative Automated Guided Vehicles (AGVs) and mobile robotics in general. Working along that setting this thesis provides insights regarding the whole network stack. Thereby, the focus is always on industrial applications. Starting on the physical layer, extensive measurements from 2 GHz to 6 GHz on the factory floor are performed. The raw data is published and analyzed. Based on that data an improved Saleh-Valenzuela (SV) model is provided. As ad-hoc networks are highly depended onnode mobility, the mobility of AGVs is modeled. Additionally, Nodal Encounter Patterns (NEPs) are recorded and analyzed. A method to record NEP is illustrated. The performance by means of latency and reliability are key parameters from an application perspective. Thus, measurements of those two parameters in factory environments are performed using Wireless Local Area Network (WLAN) (IEEE 802.11n), private Long Term Evolution (pLTE) and 5G. This showed auto-correlated latency values. Hence, a method to construct confidence intervals based on auto-correlated data containing rare events is developed. Subsequently, four performance improvements for wireless networks on the factory floor are proposed. Of those optimization three cover ad-hoc networks, two deal with safety relevant communication, one orchestrates the usage of two orthogonal networks and lastly one optimizes the usage of information within cellular networks. Finally, this thesis is concluded by an outlook toward open research questions. This includes open questions remaining in the context of industry 4.0 and further the ones around 6G. Along the research topics of 6G the two most relevant topics concern the ideas of a network of networks and overcoming best-effort IP.

Hybrid multi-agent systems: modeling, specification and verification (2010)

Mohammed, Ammar

Specifying behaviors of multi-agent systems (MASs) is a demanding task, especially when applied in safety-critical systems. In the latter systems, the specification of behaviors has to be carried out carefully in order to avoid side effects that might cause unwanted or even disastrous behaviors. Thus, formal methods based on mathematical models of the system under design are helpful. They not only allow us to formally specify the system at different levels of abstraction, but also to verify the consistency of the specified systems before implementing them. The formal specification aims a precise and unambiguous description of the behavior of MASs, whereas the verification aims at proving the satisfaction of specified requirements. A behavior of an agent can be described as discrete changes of its states with respect to external or internal actions. Whenever an action occurs, the agent moves from one state to another one. Therefore, an efficient way to model this type of discrete behaviors is to use a kind of state transition diagrams such as finite automata. One remarkable advantage of such transition diagrams is that they lend themselves formal analysis techniques using model checking. The latter is an automatic verification technique which determines whether given properties are satisfied within a model underlying a particular system. In realistic physical environments, however, it is necessary to consider continuous behaviors in addition to discrete behaviors of MASs. Examples of those type of behaviors include the movement of a soccer agent to kick off or to go to the ball, the process of putting out the fire by a fire brigade agent in a rescue scenario, or any other behaviors that depend on any timed physical law. The traditional state transition diagrams are not sufficient to combine these types of behaviors. Hybrid automata offer an elegant method to capture such types of behaviors. Hybrid automata extend regular state transition diagrams with methods that deal with those continuous actions such that the state transition diagrams are used to model the discrete changes of behaviors, while differential equations are used to model the continuous changes. The semantics of hybrid automata make them accessible to formal verification by means of model checking. The main goal of this thesis is to approach hybrid automata for specifying and verifying behaviors of MASs. However, specifying and and verifying behaviors of MASs by means of hybrid automata raises several issues that should be considered. These issues include the complexity, modularity, and the expressiveness of MASs' models. This thesis addresses these issues and provides possible solutions to tackle them.

Mining Social Media: Methods and Approaches for Content Analysis (2014)

Naveed, Nasir

Web 2.0 provides technologies for online collaboration of users as well as the creation, publication and sharing of user-generated contents in an interactive way. Twitter, CNET, CiteSeerX, etc. are examples of Web 2.0 platforms which facilitate users in these activities and are viewed as rich sources of information. In the platforms mentioned as examples, users can participate in discussions, comment others, provide feedback on various issues, publish articles and write blogs, thereby producing a high volume of unstructured data which at the same time leads to an information overload. To satisfy various types of human information needs arising from the purpose and nature of the platforms requires methods for appropriate aggregation and automatic analysis of this unstructured data. In this thesis, we propose methods which attempt to overcome the problem of information overload and help in satisfying user information needs in three scenarios. To this end, first we look at two of the main challenges of sparsity and content quality in Twitter and how these challenges can influence standard retrieval models. We analyze and identify Twitter content features that reflect high quality information. Based on this analysis we introduce the concept of "interestingness" as a static quality measure. We empirically show that our proposed measure helps in retrieving and filtering high quality information in Twitter. Our second contribution relates to the content diversification problem in a collaborative social environment, where the motive of the end user is to gain a comprehensive overview of the pros and cons of a discussion track which results from social collaboration of the people. For this purpose, we develop the FREuD approach which aims at solving the content diversification problem by combining latent semantic analysis with sentiment estimation approaches. Our evaluation results show that the FREuD approach provides a representative overview of sub-topics and aspects of discussions, characteristic user sentiments under different aspects, and reasons expressed by different opponents. Our third contribution presents a novel probabilistic Author-Topic-Time model, which aims at mining topical trends and user interests from social media. Our approach solves this problem by means of Bayesian modeling of relations between authors, latent topics and temporal information. We present results of application of the model to the scientific publication datasets from CiteSeerX showing improved semantically cohesive topic detection and capturing shifts in authors" interest in relation to topic evolution.

Silence is golden: reactive local topology control and geographic routing in wireless ad hoc and sensor networks (2016)

Neumann, Florentin

Reactive local algorithms are distributed algorithms which suit the needs of battery-powered, large-scale wireless ad hoc and sensor networks particularly well. By avoiding both unnecessary wireless transmissions and proactive maintenance of neighborhood tables (i.e., beaconing), such algorithms minimize communication load and overhead, and scale well with increasing network size. This way, resources such as bandwidth and energy are saved, and the probability of message collisions is reduced, which leads to an increase in the packet reception ratio and a decrease of latencies. Currently, the two main application areas of this algorithm type are geographic routing and topology control, in particular the construction of a node's adjacency in a connected, planar representation of the network graph. Geographic routing enables wireless multi-hop communication in the absence of any network infrastructure, based on geographic node positions. The construction of planar topologies is a requirement for efficient, local solutions for a variety of algorithmic problems. This thesis contributes to reactive algorithm research in two ways, on an abstract level, as well as by the introduction of novel algorithms: For the very first time, reactive algorithms are considered as a whole and as an individual research area. A comprehensive survey of the literature is given which lists and classifies known algorithms, techniques, and application domains. Moreover, the mathematical concept of O- and Omega-reactive local topology control is introduced. This concept unambiguously distinguishes reactive from conventional, beacon-based, topology control algorithms, serves as a taxonomy for existing and prospective algorithms of this kind, and facilitates in-depth investigations of the principal power of the reactive approach, beyond analysis of concrete algorithms. Novel reactive local topology control and geographic routing algorithms are introduced under both the unit disk and quasi unit disk graph model. These algorithms compute a node's local view on connected, planar, constant stretch Euclidean and topological spanners of the underlying network graph and route messages reactively on these spanners while guaranteeing the messages' delivery. All previously known algorithms are either not reactive, or do not provide constant Euclidean and topological stretch properties. A particularly important partial result of this work is that the partial Delaunay triangulation (PDT) is a constant stretch Euclidean spanner for the unit disk graph. To conclude, this thesis provides a basis for structured and substantial research in this field and shows the reactive approach to be a powerful tool for algorithm design in wireless ad hoc and sensor networking.

Marrying model-driven engineering and ontology technologies: the TwoUse approach (2011)

Parreiras, Fernando Silva

The semantic web and model-driven engineering are changing the enterprise computing paradigm. By introducing technologies like ontologies, metadata and logic, the semantic web improves drastically how companies manage knowledge. In counterpart, model-driven engineering relies on the principle of using models to provide abstraction, enabling developers to concentrate on the system functionality rather than on technical platforms. The next enterprise computing era will rely on the synergy between both technologies. On the one side, ontology technologies organize system knowledge in conceptual domains according to its meaning. It addresses enterprise computing needs by identifying, abstracting and rationalizing commonalities, and checking for inconsistencies across system specifications. On the other side, model-driven engineering is closing the gap among business requirements, designs and executables by using domain-specific languages with custom-built syntax and semantics. In this scenario, the research question that arises is: What are the scientific and technical results around ontology technologies that can be used in model-driven engineering and vice versa? The objective is to analyze approaches available in the literature that involve both ontologies and model-driven engineering. Therefore, we conduct a literature review that resulted in a feature model for classifying state-of-the-art approaches. The results show that the usage of ontologies and model-driven engineering together have multiple purposes: validation, visual notation, expressiveness and interoperability. While approaches involving both paradigms exist, an integrated approach for UML class-based modeling and ontology modeling is lacking so far. Therefore, we investigate the techniques and languages for designing integrated models. The objective is to provide an approach to support the design of integrated solutions. Thus, we develop a conceptual framework involving the structure and the notations of a solution to represent and query software artifacts using a combination of ontologies and class-based modeling. As proof of concept, we have implemented our approach as a set of open source plug-ins -- the TwoUse Toolkit. The hypothesis is that a combination of both paradigms yields improvements in both fields, ontology engineering and model-driven engineering. For MDE, we investigate the impact of using features of the Web Ontology Language in software modeling. The results are patterns and guidelines for designing ontology-based information systems and for supporting software engineers in modeling software. The results include alternative ways of describing classes and objects and querying software models and metamodels. Applications show improvements on changeability and extensibility. In the ontology engineering domain, we investigate the application of techniques used in model-driven engineering to fill the abstraction gap between ontology specification languages and programming languages. The objective is to provide a model-driven platform for supporting activities in the ontology engineering life cycle. Therefore, we study the development of core ontologies in our department, namely the core ontology for multimedia (COMM) and the multimedia metadata ontology. The results are domain-specific languages that allow ontology engineers to abstract from implementation issues and concentrate on the ontology engineering task. It results in increasing productivity by filling the gap between domain models and source code.

Corpus-based empirical research in software engineering (2014)

Pek, Ekaterina

In the recent years, Software Engineering research has shown the rise of interest in the empirical studies. Such studies are often based on empirical evidence derived from corpora - collections of software artifacts. While there are established forms of carrying out empirical research (experiments, case studies, surveys, etc.), the common task of preparing the underlying collection of software artifacts is typically addressed in ad hoc manner. In this thesis, by means of a literature survey we show how frequently software engineering research employs software corpora and using a developed classification scheme we discuss their characteristics. Addressing the lack of methodology, we suggest a method of corpus (re-)engineering and apply it to an existing collection of Java projects. We report two extensive empirical studies, where we perform a broad and diverse range of analyses on the language for privacy preferences (P3P) and on object-oriented application programming interfaces (APIs). In both cases, we are driven by the data at hand, by the corpus itself, discovering the actual usage of the languages.

Automated Reasoning Embedded in Question Answering (2013)

Pelzer, Björn

This dissertation investigates the usage of theorem provers in automated question answering (QA). QA systems attempt to compute correct answers for questions phrased in a natural language. Commonly they utilize a multitude of methods from computational linguistics and knowledge representation to process the questions and to obtain the answers from extensive knowledge bases. These methods are often syntax-based, and they cannot derive implicit knowledge. Automated theorem provers (ATP) on the other hand can compute logical derivations with millions of inference steps. By integrating a prover into a QA system this reasoning strength could be harnessed to deduce new knowledge from the facts in the knowledge base and thereby improve the QA capabilities. This involves challenges in that the contrary approaches of QA and automated reasoning must be combined: QA methods normally aim for speed and robustness to obtain useful results even from incomplete of faulty data, whereas ATP systems employ logical calculi to derive unambiguous and rigorous proofs. The latter approach is difficult to reconcile with the quantity and the quality of the knowledge bases in QA. The dissertation describes modifications to ATP systems in order to overcome these obstacles. The central example is the theorem prover E-KRHyper which was developed by the author at the Universität Koblenz-Landau. As part of the research work for this dissertation E-KRHyper was embedded into a framework of components for natural language processing, information retrieval and knowledge representation, together forming the QA system LogAnswer. Also presented are additional extensions to the prover implementation and the underlying calculi which go beyond enhancing the reasoning strength of QA systems by giving access to external knowledge sources like web services. These allow the prover to fill gaps in the knowledge during the derivation, or to use external ontologies in other ways, for example for abductive reasoning. While the modifications and extensions detailed in the dissertation are a direct result of adapting an ATP system to QA, some of them can be useful for automated reasoning in general. Evaluation results from experiments and competition participations demonstrate the effectiveness of the methods under discussion.

Refine

Author

Year of publication

Document Type

Language

Keywords

Institute

29 search hits