## Doctoral Thesis

### Refine

#### Document Type

- Doctoral Thesis (2) (remove)

#### Language

- English (2) (remove)

#### Keywords

- Bildsegmentierung (1)
- Bildverarbeitung (1)
- Computer Vision (1)
- Data compression (1)
- Datenkompression (1)
- Deep Metric Learning (1)
- Diagnoseunterstützung (1)
- Diagnosis assistance (1)
- Gefäßanalyse (1)
- Graphik-Hardware (1)

#### Institute

- Institut für Computervisualistik (2) (remove)

This thesis addresses the automated identification and localization of a time-varying number of objects in a stream of sensor data. The problem is challenging due to its combinatorial nature: If the number of objects is unknown, the number of possible object trajectories grows exponentially with the number of observations. Random finite sets are a relatively new theory that has been developed to derive at principled and efficient approximations. It is based around set-valued random variables that contain an unknown number of elements which appear in arbitrary order and are themselves random. While extensively studied in theory, random finite sets have not yet become a leading paradigm in practical computer vision and robotics applications. This thesis explores random finite sets in visual tracking applications. The first method developed in this thesis combines set-valued recursive filtering with global optimization. The problem is approached in a min-cost flow network formulation, which has become a standard inference framework for multiple object tracking due to its efficiency and optimality. A main limitation of this formulation is a restriction to unary and pairwise cost terms. This circumstance makes integration of higher-order motion models challenging. The method developed in this thesis approaches this limitation by application of a Probability Hypothesis Density filter. The Probability Hypothesis Density filter was the first practically implemented state estimator based on random finite sets. It circumvents the combinatorial nature of data association itself by propagation of an object density measure that can be computed efficiently, without maintaining explicit trajectory hypotheses. In this work, the filter recursion is used to augment measurements with an additional hidden kinematic state to be used for construction of more informed flow network cost terms, e.g., based on linear motion models. The method is evaluated on public benchmarks where a considerate improvement is achieved compared to network flow formulations that are based on static features alone, such as distance between detections and appearance similarity. A second part of this thesis focuses on the related task of detecting and tracking a single robot operator in crowded environments. Different from the conventional multiple object tracking scenario, the tracked individual can leave the scene and later reappear after a longer period of absence. Therefore, a re-identification component is required that picks up the track on reentrance. Based on random finite sets, the Bernoulli filter is an optimal Bayes filter that provides a natural representation for this type of problem. In this work, it is shown how the Bernoulli filter can be combined with a Probability Hypothesis Density filter to track operator and non-operators simultaneously. The method is evaluated on a publicly available multiple object tracking dataset as well as on custom sequences that are specific to the targeted application. Experiments show reliable tracking in crowded scenes and robust re-identification after long term occlusion. Finally, a third part of this thesis focuses on appearance modeling as an essential aspect of any method that is applied to visual object tracking scenarios. Therefore, a feature representation that is robust to pose variations and changing lighting conditions is learned offline, before the actual tracking application. This thesis proposes a joint classification and metric learning objective where a deep convolutional neural network is trained to identify the individuals in the training set. At test time, the final classification layer can be stripped from the network and appearance similarity can be queried using cosine distance in representation space. This framework represents an alternative to direct metric learning objectives that have required sophisticated pair or triplet sampling strategies in the past. The method is evaluated on two large scale person re-identification datasets where competitive results are achieved overall. In particular, the proposed method better generalizes to the test set compared to a network trained with the well-established triplet loss.

This thesis focuses on the utilization of modern graphics hardware (GPU) for visualization and computation purposes, especially of volumetric data from medical imaging. The considerable increase in raw computing power in recent years has turned commodity systems into high-performance workstations. In combination with the direct rendering capabilities of graphics hardware, "visual computing" and "computational steering" approaches on large data sets have become feasible. In this regard several example applications and concepts such as the "ray textures" have been developed and are discussed in detail. As the amount of data to be processed and visualized is steadily increasing, memory and bandwidth limitations require compact representations of the data. While the compression of image data has been investigated extensively in the past, the thesis addresses possibilities of performing computations directly on the compressed data. Therefore, different categories of algorithms are identified and represented in the wavelet domain. By using special variants of the compressed format, efficient implementations of essential image processing algorithms are possible and demonstrate the potential of the approach. From the technical perspective, the GPU-based framework "Cascada" has been developed in the course of this thesis. The introduction of object-oriented concepts to shader programming, as well as a hierarchical representation of computation and/or visualization procedures led to a simplified utilization of graphics hardware while maintaining competitive performance. This is shown with different implementations throughout the contributions, as well as two clinical projects in the field of diagnosis assistance. On the one hand the semi-automatic segmentation of low-resolution MRI data sets of the human liver is evaluated. On the other hand different possibilities in assessing abdominal aortic aneurysms are discussed; both projects make use of graphics hardware. In addition, "Cascada" provides extensions towards recent general-purpose programming architectures and a modular design for future developments.