Methods Based on Random Finite Sets for Object Tracking in Computer Vision and Robotics

  • This thesis addresses the automated identification and localization of a time-varying number of objects in a stream of sensor data. The problem is challenging due to its combinatorial nature: If the number of objects is unknown, the number of possible object trajectories grows exponentially with the number of observations. Random finite sets are a relatively new theory that has been developed to derive at principled and efficient approximations. It is based around set-valued random variables that contain an unknown number of elements which appear in arbitrary order and are themselves random. While extensively studied in theory, random finite sets have not yet become a leading paradigm in practical computer vision and robotics applications. This thesis explores random finite sets in visual tracking applications. The first method developed in this thesis combines set-valued recursive filtering with global optimization. The problem is approached in a min-cost flow network formulation, which has become a standard inference framework for multiple object tracking due to its efficiency and optimality. A main limitation of this formulation is a restriction to unary and pairwise cost terms. This circumstance makes integration of higher-order motion models challenging. The method developed in this thesis approaches this limitation by application of a Probability Hypothesis Density filter. The Probability Hypothesis Density filter was the first practically implemented state estimator based on random finite sets. It circumvents the combinatorial nature of data association itself by propagation of an object density measure that can be computed efficiently, without maintaining explicit trajectory hypotheses. In this work, the filter recursion is used to augment measurements with an additional hidden kinematic state to be used for construction of more informed flow network cost terms, e.g., based on linear motion models. The method is evaluated on public benchmarks where a considerate improvement is achieved compared to network flow formulations that are based on static features alone, such as distance between detections and appearance similarity. A second part of this thesis focuses on the related task of detecting and tracking a single robot operator in crowded environments. Different from the conventional multiple object tracking scenario, the tracked individual can leave the scene and later reappear after a longer period of absence. Therefore, a re-identification component is required that picks up the track on reentrance. Based on random finite sets, the Bernoulli filter is an optimal Bayes filter that provides a natural representation for this type of problem. In this work, it is shown how the Bernoulli filter can be combined with a Probability Hypothesis Density filter to track operator and non-operators simultaneously. The method is evaluated on a publicly available multiple object tracking dataset as well as on custom sequences that are specific to the targeted application. Experiments show reliable tracking in crowded scenes and robust re-identification after long term occlusion. Finally, a third part of this thesis focuses on appearance modeling as an essential aspect of any method that is applied to visual object tracking scenarios. Therefore, a feature representation that is robust to pose variations and changing lighting conditions is learned offline, before the actual tracking application. This thesis proposes a joint classification and metric learning objective where a deep convolutional neural network is trained to identify the individuals in the training set. At test time, the final classification layer can be stripped from the network and appearance similarity can be queried using cosine distance in representation space. This framework represents an alternative to direct metric learning objectives that have required sophisticated pair or triplet sampling strategies in the past. The method is evaluated on two large scale person re-identification datasets where competitive results are achieved overall. In particular, the proposed method better generalizes to the test set compared to a network trained with the well-established triplet loss.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Nicolai Wojke
URN:urn:nbn:de:kola-17087
Referee:Dietrich Paulus
Document Type:Doctoral Thesis
Language:English
Date of completion:2018/09/14
Date of publication:2018/09/24
Publishing institution:Universität Koblenz-Landau, Universitätsbibliothek
Granting institution:Universität Koblenz-Landau, Campus Koblenz, Fachbereich 4
Date of final exam:2018/07/18
Release Date:2018/09/24
Tag:Computer Vision; Deep Metric Learning; Multiple Object Tracking; Random Finite Sets
Number of pages:x, 168
Institutes:Fachbereich 4 / Institut für Computervisualistik
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):License LogoEs gilt das deutsche Urheberrecht: § 53 UrhG