OPUS 4 | Search

Discovering and exploiting semantics in folksonomies (2011)

Abbasi, Rabeeh

Folksonomies are Web 2.0 platforms where users share resources with each other. Furthermore, they can assign keywords (called tags) to the resources for categorizing and organizing the resources. Numerous types of resources like websites (Delicious), images (Flickr), and videos (YouTube) are supported by different folksonomies. The folksonomies are easy to use and thus attract the attention of millions of users. Together with the ease they offer, there are also some problems. This thesis addresses different problems of folksonomies and proposes solutions for these problems. The first problem occurs when users search for relevant resources in folksonomies. Often, the users are not able to find all relevant resources because they don't know which tags are relevant. The second problem is assigning tags to resources. Although many folksonomies (like Delicious) recommend tags for the resources, other folksonomies (like Flickr) do not recommend any tags. Tag recommendation helps the users to easily tag their resources. The third problem is that tags and resources are lacking semantics. This leads for example to ambiguous tags. The tags are lacking semantics because they are freely chosen keywords. The automatic identification of the semantics of tags and resources helps in reducing problems that arise from this freedom of the users in choosing the tags. This thesis proposes methods which exploit semantics to address the problems of search, tag recommendation, and the identification of tag semantics. The semantics are discovered from a variety of sources. In this thesis, we exploit web search engines, online social communities and the co-occurrences of tags as sources of semantics. Using different sources for discovering semantics reduces the efforts to build systems which solve the problems mentioned earlier. This thesis evaluates the proposed methods on a large scale data set. The evaluation results suggest that it is possible to exploit the semantics for improving search, recommendation of tags, and automatic identification of the semantics of tags and resources.

Emotion and Sentiment Detection in Unstructured Social Data (2022)

AlGhalibi, Maha

Social media provides a powerful way for people to share opinions and sentiments about a specific topic, allowing others to benefit from these thoughts and feelings. This procedure generates a huge amount of unstructured data, such as texts, images, and references that are constantly increasing through daily comments to related discussions. However, the vast amount of unstructured data presents risks to the information-extraction process, and so decision making becomes highly challenging. This is because data overload may cause the loss of useful data due to its inappropriate presentation and its accumulation. To this extent, this thesis contributed to the field of analyzing and detecting feelings in images and texts. And that by extracting the feelings and opinions hidden in a huge collection of image data and texts on social networks After that, these feelings are classified into positive, negative, or neutral, according to the features of the classified data. The process of extracting these feelings greatly helps in decision-making processes on various topics as will be explained in the first chapter of the thesis. A system has been built that can classify the feelings inherent in the images and texts on social media sites, such as people’s opinions about products and companies, personal posts, and general messages. This thesis begins by introducing a new method of reducing the dimension of text data based on data-mining approaches and then examines the sentiment based on neural and deep neural network classification algorithms. Subsequently, in contrast to sentiment analysis research in text datasets, we examine sentiment expression and polarity classification within and across image datasets by building deep neural networks based on the attention mechanism.

A methodology for secure interactive systems (2008)

Beuster, Gerd

This dissertation introduces a methodology for formal specification and verification of user interfaces under security aspects. The methodology allows to use formal methods pervasively in the specification and verification of human-computer interaction. This work consists of three parts. In the first part, a formal methodology for the description of human-computer interaction is developed. In the second part, existing definitions of computer security are adapted for human-computer interaction and formalized. A generic formal model of human-computer interaction is developed. In the third part, the methodology is applied to the specification and verification of a secure email client.

Proactive Content Placement in Information-Centric Connected Vehicle Environments (2021)

Grewe, Dennis

Connected vehicles will have a tremendous impact on tomorrow’s mobility solutions. Such systems will heavily rely on information delivery in time to ensure the functional reliability, security and safety. However, the host-centric communication model of today’s networks questions efficient data dissemination in a scale, especially in networks characterized by a high degree of mobility. The Information-Centric Networking (ICN) paradigm has evolved as a promising candidate for the next generation of network architectures. Based on a loosely coupled communication model, the in-network processing and caching capabilities of ICNs are promising to solve the challenges set by connected vehicular systems. In such networks, a special class of caching strategies which take action by placing a consumer’s anticipated content actively at the right network nodes in time are promising to reduce the data delivery time. This thesis contributes to the research in active placement strategies in information-centric and computation-centric vehicle networks for providing dynamic access to content and computation results. By analyzing different vehicular applications and their requirements, novel caching strategies are developed in order to reduce the time of content retrieval. The caching strategies are compared and evaluated against the state-of-the-art in both extensive simulations as well as real world deployments. The results are showing performance improvements by increasing the content retrieval (availability of specific data increased up to 35% compared to state-of-the-art caching strategies), and reducing the delivery times (roughly double the number of data retrieval from neighboring nodes). However, storing content actively in connected vehicle networks raises questions regarding security and privacy. In the second part of the thesis, an access control framework for information-centric connected vehicles is presented. Finally, open security issues and research directions in executing computations at the edge of connected vehicle networks are presented.

Distributed Query Processing for Federated RDF Data Management (2015)

Görlitz, Olaf

The publication of freely available and machine-readable information has increased significantly in the last years. Especially the Linked Data initiative has been receiving a lot of attention. Linked Data is based on the Resource Description Framework (RDF) and anybody can simply publish their data in RDF and link it to other datasets. The structure is similar to the World Wide Web where individual HTML documents are connected with links. Linked Data entities are identified by URIs which are dereferenceable to retrieve information describing the entity. Additionally, so called SPARQL endpoints can be used to access the data with an algebraic query language (SPARQL) similar to SQL. By integrating multiple SPARQL endpoints it is possible to create a federation of distributed RDF data sources which acts like one big data store. In contrast to the federation of classical relational database systems there are some differences for federated RDF data. RDF stores are accessed either via SPARQL endpoints or by resolving URIs. There is no coordination between RDF data sources and machine-readable meta data about a source- data is commonly limited or not available at all. Moreover, there is no common directory which can be used to discover RDF data sources or ask for sources which offer specific data. The federation of distributed and linked RDF data sources has to deal with various challenges. In order to distribute queries automatically, suitable data sources have to be selected based on query details and information that is available about the data sources. Furthermore, the minimization of query execution time requires optimization techniques that take into account the execution cost for query operators and the network communication overhead for contacting individual data sources. In this thesis, solutions for these problems are discussed. Moreover, SPLENDID is presented, a new federation infrastructure for distributed RDF data sources which uses optimization techniques based on statistical information.

Knowledge engineering for software languages and software technologies (2022)

Heinz, Marcel

For software engineers, conceptually understanding the tools they are using in the context of their projects is a daily challenge and a prerequisite for complex tasks. Textual explanations and code examples serve as knowledge resources for understanding software languages and software technologies. This thesis describes research on integrating and interconnecting existing knowledge resources, which can then be used to assist with understanding and comparing software languages and software technologies on a conceptual level. We consider the following broad research questions that we later refine: What knowledge resources can be systematically reused for recovering structured knowledge and how? What vocabulary already exists in literature that is used to express conceptual knowledge? How can we reuse the online encyclopedia Wikipedia? How can we detect and report on instances of technology usage? How can we assure reproducibility as the central quality factor of any construction process for knowledge artifacts? As qualitative research, we describe methodologies to recover knowledge resources by i.) systematically studying literature, ii.) mining Wikipedia, iii.) mining available textual explanations and code examples of technology usage. The theoretical findings are backed by case studies. As research contributions, we have recovered i.) a reference semantics of vocabulary for describing software technology usage with an emphasis on software languages, ii.) an annotated corpus of Wikipedia articles on software languages, iii.) insights into technology usage on GitHub with regard to a catalog of pattern and iv.) megamodels of technology usage that are interconnected with existing textual explanations and code examples.

Technical and Methodological Improvements to Mining Software Repositories (2024)

Härtel, Johannes

Empirical studies in software engineering use software repositories as data sources to understand software development. Repository data is either used to answer questions that guide the decision-making in the software development, or to provide tools that help with practical aspects of developers’ everyday work. Studies are classified into the field of Empirical Software Engineering (ESE), and more specifically into Mining Software Repositories (MSR). Studies working with repository data often focus on their results. Results are statements or tools, derived from the data, that help with practical aspects of software development. This thesis focuses on the methods and high order methods used to produce such results. In particular, we focus on incremental methods to scale the processing of repositories, declarative methods to compose a heterogeneous analysis, and high order methods used to reason about threats to methods operating on repositories. We summarize this as technical and methodological improvements. We contribute the improvements to methods and high-order methods in the context of MSR/ESE to produce future empirical results more effectively. We contribute the following improvements. We propose a method to improve the scalability of functions that abstract over repositories with high revision count in a theoretically founded way. We use insights on abstract algebra and program incrementalization to define a core interface of highorder functions that compute scalable static abstractions of a repository with many revisions. We evaluate the scalability of our method by benchmarks, comparing a prototype with available competitors in MSR/ESE. We propose a method to improve the definition of functions that abstract over a repository with a heterogeneous technology stack, by using concepts from declarative logic programming and combining them with ideas on megamodeling and linguistic architecture. We reproduce existing ideas on declarative logic programming with languages close to Datalog, coming from architecture recovery, source code querying, and static program analysis, and transfer them from the analysis of a homogeneous to a heterogeneous technology stack. We provide a prove-of-concept of such method in a case study. We propose a high-order method to improve the disambiguation of threats to methods used in MSR/ESE. We focus on a better disambiguation of threats, operationalizing reasoning about them, and making the implications to a valid data analysis methodology explicit, by using simulations. We encourage researchers to accomplish their work by implementing ‘fake’ simulations of their MSR/ESE scenarios, to operationalize relevant insights about alternative plausible results, negative results, potential threats and the used data analysis methodologies. We prove that such way of simulation based testing contributes to the disambiguation of threats in published MSR/ESE research.

Markov random field terrain classification for autonomous robots in unstructured terrain (2015)

Häselich, Marcel

This thesis addresses the problem of terrain classification in unstructured outdoor environments. Terrain classification includes the detection of obstacles and passable areas as well as the analysis of ground surfaces. A 3D laser range finder is used as primary sensor for perceiving the surroundings of the robot. First of all, a grid structure is introduced for data reduction. The chosen data representation allows for multi-sensor integration, e.g., cameras for color and texture information or further laser range finders for improved data density. Subsequently, features are computed for each terrain cell within the grid. Classification is performedrnwith a Markov random field for context-sensitivity and to compensate for sensor noise and varying data density within the grid. A Gibbs sampler is used for optimization and is parallelized on the CPU and GPU in order to achieve real-time performance. Dynamic obstacles are detected and tracked using different state-of-the-art approaches. The resulting information - where other traffic participants move and are going to move to - is used to perform inference in regions where the terrain surface is partially or completely invisible for the sensors. Algorithms are tested and validated on different autonomous robot platforms and the evaluation is carried out with human-annotated ground truth maps of millions of measurements. The terrain classification approach of this thesis proved reliable in all real-time scenarios and domains and yielded new insights. Furthermore, if combined with a path planning algorithm, it enables full autonomy for all kinds of wheeled outdoor robots in natural outdoor environments.

Practices, Networks and Success in Creative Careers: Study of Inequalities using Large-scale Digital Behavioural Data (2023)

Jadidi, Mohsen

In the last decade, policy-makers around the world have turned their attention toward the creative industry as the economic engine and significant driver of employments. Yet, the literature suggests that creative workers are one of the most vulnerable work-forces of today’s economy. Because of the highly deregulated and highly individuated environment, failure or success are believed to be the byproduct of individual ability and commitment, rather than a structural or collective issue. This thesis taps into the temporal, spatial, and social resolution of digital behavioural data to show that there are indeed structural and historical issues that impact individuals’ and groups’ careers. To this end, this thesis offers a computational social science research framework that brings together the decades-long theoretical and empirical knowledge of inequality studies, and computational methods that deal with the complexity and scale of digital data. By taking music industry and science as use cases, this thesis starts off by proposing a novel gender detection method that exploits image search and face-detection methods. By analysing the collaboration patterns and citation networks of male and female computer scientists, it sheds lights on some of the historical biases and disadvantages that women face in their scientific career. In particular, the relation of scientific success and gender-specific collaboration patterns is assessed. To elaborate further on the temporal aspect of inequalities in scientific careers, this thesis compares the degree of vertical and horizontal inequalities among the cohorts of scientists that started their career at different point in time. Furthermore, the structural inequality in music industry is assessed by analyzing the social and cultural relations that breed from live performances and musics releases. The findings hint toward the importance of community belonging at different stages of artists’ careers. This thesis also quantifies some of the underlying mechanisms and processes of inequality, such as the Matthew Effect and the Hipster Paradox, in creative careers. Finally, this thesis argues that online platforms such as Wikipedia could reflect and amplify the existing biases.

Secure semantic web data management (2016)

Kasten, Andreas

Confidentiality, integrity, and availability are often listed as the three major requirements for achieving data security and are collectively referred to as the C-I-A triad. Confidentiality of data restricts the data access to authorized parties only, integrity means that the data can only be modified by authorized parties, and availability states that the data must always be accessible when requested. Although these requirements are relevant for any computer system, they are especially important in open and distributed networks. Such networks are able to store large amounts of data without having a single entity in control of ensuring the data's security. The Semantic Web applies to these characteristics as well as it aims at creating a global and decentralized network of machine-readable data. Ensuring the confidentiality, integrity, and availability of this data is therefore also important and must be achieved by corresponding security mechanisms. However, the current reference architecture of the Semantic Web does not define any particular security mechanism yet which implements these requirements. Instead, it only contains a rather abstract representation of security. This thesis fills this gap by introducing three different security mechanisms for each of the identified security requirements confidentiality, integrity, and availability of Semantic Web data. The mechanisms are not restricted to the very basics of implementing each of the requirements and provide additional features as well. Confidentiality is usually achieved with data encryption. This thesis not only provides an approach for encrypting Semantic Web data, it also allows to search in the resulting ciphertext data without decrypting it first. Integrity of data is typically implemented with digital signatures. Instead of defining a single signature algorithm, this thesis defines a formal framework for signing arbitrary Semantic Web graphs which can be configured with various algorithms to achieve different features. Availability is generally supported by redundant data storage. This thesis expands the classical definition of availability to compliant availability which means that data must only be available as long as the access request complies with a set of predefined policies. This requirement is implemented with a modular and extensible policy language for regulating information flow control. This thesis presents each of these three security mechanisms in detail, evaluates them against a set of requirements, and compares them with the state of the art and related work.

Refine

Author

Year of publication

Document Type

Language

Keywords

Institute

27 search hits