54 Informatik
Refine
Document Type
- Master's Thesis (14)
- Bachelor Thesis (11)
- Doctoral Thesis (7)
Keywords
- virtual reality (2)
- Action Recognition (1)
- Action Segmentation (1)
- Analysis of social platform (1)
- Artificial Intelligence (1)
- Astrophysik (1)
- Augmented Reality (1)
- CCRDMT (1)
- Computergrafik (1)
- Computervisualistik (1)
On the recognition of human activities and the evaluation of its imitation by robotic systems
(2023)
This thesis addresses the problem of action recognition through the analysis of human motion and the benchmarking of its imitation by robotic systems.
For our action recognition related approaches, we focus on presenting approaches that generalize well across different sensor modalities. We transform multivariate signal streams from various sensors to a common image representation. The action recognition problem on sequential multivariate signal streams can then be reduced to an image classification task for which we utilize recent advances in machine learning. We demonstrate the broad applicability of our approaches formulated as a supervised classification task for action recognition, a semi-supervised classification task for one-shot action recognition, modality fusion and temporal action segmentation.
For action classification, we use an EfficientNet Convolutional Neural Network (CNN) model to classify the image representations of various data modalities. Further, we present approaches for filtering and the fusion of various modalities on a representation level. We extend the approach to be applicable for semi-supervised classification and train a metric-learning model that encodes action similarity. During training, the encoder optimizes the distances in embedding space for self-, positive- and negative-pair similarities. The resulting encoder allows estimating action similarity by calculating distances in embedding space. At training time, no action classes from the test set are used.
Graph Convolutional Network (GCN) generalized the concept of CNNs to non-Euclidean data structures and showed great success for action recognition directly operating on spatio-temporal sequences like skeleton sequences. GCNs have recently shown state-of-the-art performance for skeleton-based action recognition but are currently widely neglected as the foundation for the fusion of various sensor modalities. We propose incorporating additional modalities, like inertial measurements or RGB features, into a skeleton-graph, by proposing fusion on two different dimensionality levels. On a channel dimension, modalities are fused by introducing additional node attributes. On a spatial dimension, additional nodes are incorporated into the skeleton-graph.
Transformer models showed excellent performance in the analysis of sequential data. We formulate the temporal action segmentation task as an object detection task and use a detection transformer model on our proposed motion image representations. Experiments for our action recognition related approaches are executed on large-scale publicly available datasets. Our approaches for action recognition for various modalities, action recognition by fusion of various modalities, and one-shot action recognition demonstrate state-of-the-art results on some datasets.
Finally, we present a hybrid imitation learning benchmark. The benchmark consists of a dataset, metrics, and a simulator integration. The dataset contains RGB-D image sequences of humans performing movements and executing manipulation tasks, as well as the corresponding ground truth. The RGB-D camera is calibrated against a motion-capturing system, and the resulting sequences serve as input for imitation learning approaches. The resulting policy is then executed in the simulated environment on different robots. We propose two metrics to assess the quality of the imitation. The trajectory metric gives insights into how close the execution was to the demonstration. The effect metric describes how close the final state was reached according to the demonstration. The Simitate benchmark can improve the comparability of imitation learning approaches.
This thesis explores a 3D object detection and pose estimation approach based on the point pair features method presented by Drost et. al. [Dro+10]. While pose estimation methods have shown good improvements, they still remain a crucial problem on the computer vision field. In this work, we implemented a program that takes point cloud scenes as input and returns the detected object with their estimated pose. The program fully covers an object detection pipeline by processing 3D models during an offline phase, extracting their point pair features and creating a global descriptor out of them. During an online phase, the same features are extracted from a point cloud scene and are matched to the model features. After the voting scheme, potential poses of the object are retrieved. The poses end being clustered together and post-processed to finally deliver a result. The program was tested using simulated and real data. We evaluate these tests and present the final results, by discussing the achieved accuracy of the detections and the estimated poses.
The industry standard Decision Model and Notation (DMN) has enabled a new way for the formalization of business rules since 2015. Here, rules are modeled in so-called decision tables, which are defined by input columns and output columns. Furthermore, decisions are arranged in a graph-like structure (DRD level), which creates dependencies between them. With a given input, the decisions now can be requested by appropriate systems. Thereby, activated rules produce output for future use. However, modeling mistakes produces erroneous models, which can occur in the decision tables as well as at the DRD level. According to the Design Science Research Methodology, this thesis introduces an implementation of a verification prototype for the detection and resolution of these errors while the modeling phase. Therefore, presented basics provide the needed theoretical foundation for the development of the tool. This thesis further presents the architecture of the tool and the implemented verification capabilities. Finally, the created prototype is evaluated.
On-screen interactive presentations have got immense popularity in the domain of attentive interfaces recently. These attentive screens adapt their behavior according to the user's visual attention. This thesis aims to introduce an application that would enable these attentive interfaces to change their behavior not just according to the gaze data but also facial features and expressions. The modern era requires new ways of communications and publications for advertisement. These ads need to be more specific according to people's interests, age, and gender. When advertising, it's important to get a reaction from the user but not every user is interested in providing feedback. In such a context more, advance techniques are required that would collect user's feedback effortlessly. The main problem this thesis intends to resolve is, to apply advanced techniques of gaze and face recognition to collect data about user's reactions towards different ads being played on interactive screens. We aim to create an application that enables attentive screens to detect a person's facial features, expressions, and eye gaze. With eye gaze data we can determine the interests and with facial features, age and gender can be specified. All this information will help in optimizing the advertisements.
The distributed setting of RDF stores in the cloud poses many challenges. One such challenge is how the data placement on the compute nodes can be optimized to improve the query performance. To address this challenge, several evaluations in the literature have investigated the effects of existing data placement strategies on the query performance. A common drawback in theses evaluations is that it is unclear whether the observed behaviors were caused by the data placement strategies (if different RDF stores were evaluated as a whole) or reflect the behavior in distributed RDF stores (if cloud processing frameworks like Hadoop MapReduce are used for the evaluation). To overcome these limitations, this thesis develops a novel benchmarking methodology for data placement strategies that uses a data-placement-strategy-independent distributed RDF store to analyze the effect of the data placement strategies on query performance.
With this evaluation methodology the frequently used data placement strategies have been evaluated. This evaluation challenged the commonly held belief that data placement strategies that emphasize local computation, such as minimal edge-cut cover, lead to faster query executions. The results indicate that queries with a high workload may be executed faster on hash-based data placement strategies than on, e.g., minimal edge-cut covers. The analysis of the additional measurements indicates that vertical parallelization (i.e., a well-distributed workload) may be more important than horizontal containment (i.e., minimal data transport) for efficient query processing.
Moreover, to find a data placement strategy with a high vertical parallelization, the thesis tests the hypothesis that collocating small connected triple sets on the same compute node while balancing the amount of triples stored on the different compute nodes leads to a high vertical parallelization. Specifically, the thesis proposes two such data placement strategies. The first strategy called overpartitioned minimal edge-cut cover was found in the literature and the second strategy is the newly developed molecule hash cover. The evaluation revealed a balanced query workload and a high horizontal containment, which lead to a high vertical parallelization. As a result these strategies showed a better query performance than the frequently used data placement strategies.
Absicherung der analytischen Interpretation von Geolokalisierungsdaten in der Mobilfunkforensik
(2019)
Abstract
Location based services maybe are within one of the most outstanding features of modern mobile devices. Despite the fact, that cached geolocation data could be used to reconstruct motion profiles, the amount of devices capable to provide these information in the field of criminal investigations is growing.
Motivation
The aim of this work is to generate in-depth knowledge to questions concerning geolocation in the field of mobile forensics, making especially somehow cached geolocation data forensically valuable. On top, tools meeting the specific requirements of law enforcement personnel shall be developed.
Problems
Geolocation processes within smartphones are quite complex. For the device to locate its position, different reference systems like GPS, cell towers or WiFi hot\-spots are used in a variety of ways. The whole mobile geolocation mechanism is proprietary to the device manufacturer and not build with forensic needs in mind. One major problem regarding forensic investigations is, that mainly reference points are being extracted and processed instead of real life device location data. In addition, these geolocation information only consist of bits and bytes or numeric values that have to be securely assigned to their intended meaning. The location data recovered are full of gaps providing only a part of the process or device usage. This possible loss of data has to be determined deriving a reliable measurement for the completeness, integrity and accuracy of data. Last but not least, as for every evidence within a criminal investigation, it has to be assured, that manipulations of the data or errors in position estimation have no disadvantageous effect on the analysis.
Research Questions
In the context of localisation services in modern smartphones, it always comes back to similar questions during forensic everyday life:
* Can locations be determined at any time?
* How accurate is the location of a smartphone?
* Can location data from smartphones endure in court?
Approach
For a better understanding of geolocation processes in modern smartphones and to evaluate the quality and reliability of the geolocation artefacts, information from different platforms shall be theoretically analysed as well as observed in-place during the geolocation process. The connection between data points and localisation context will be examined in predefined live experiments as well as desktop- and native applications on smartphones.
Results
Within the scope of this thesis self developed tools have been used for forensic investigations as well as analytical interpretation of geodata from modern smartphones. Hereby a generic model for assessing the quality of location data has emerged, which can be generally applied to geodata from mobile devices.
Commonsense reasoning can be seen as a process of identifying dependencies amongst events and actions. Understanding the circumstances surrounding these events requires background knowledge with sufficient breadth to cover a wide variety of domains. In the recent decades, there has been a lot of work in extracting commonsense knowledge, a number of these projects provide their collected data as semantic networks such as ConceptNet and CausalNet. In this thesis, we attempt to undertake the Choice Of Plausible Alternatives (COPA) challenge, a problem set with 1000 questions written in multiple-choice format with a premise and two alternative choices for each question. Our approach differs from previous work by using shortest paths between concepts in a causal graph with the edge weight as causality metric. We use CausalNet as primary network and implement a few design choices to explore the strengths and drawbacks of this approach, and propose an extension using ConceptNet by leveraging its commonsense knowledge base.
This thesis is about the design and the implementation of a virtual reality experience. The goal is to answer two questions: Is it possible to create an immersive virtual reality experience which is mainly using impulses and triggers to scare and frighten users? Secondly, is this immersion strong enough to create an illusion in which the user can't separate the real world from the virtual world? To realise this project the design program Unity3D as well as Visual Studios 2017 were used. Furthermore, in order to verify that the experience is indeed immersive for the user, an experiment with a sample size of seven people was created. Afterwards the candidates were interviewed via a questionnaire how they felt during the virtual reality application. As a result the study showed that the application has tendencies to be immersive but the users were still aware of the situation. It can be concluded that the immersion was not strong enough to fool users regarding the separation of virtual and real world.
With the appearance of modern virtual reality (VR) headsets on the consumer market, there has been the biggest boom in the history of VR technology. Naturally, this was accompanied by an increasing focus on the problems of current VR hardware. Especially the control in VR has always been a complex topic.
One possible solution is the Leap Motion, a hand tracking device that was initially developed for desktop use, but with the last major software update it can be attached to standard VR headsets. This device allows very precise tracking of the user’s hands and fingers and their replication in the virtual world.
The aim of this work is to design virtual user interfaces that can be operated with the Leap Motion to provide a natural method of interaction between the user and the VR environment. After that, subject tests are performed to evaluate their performance and compare them to traditional VR controllers.