Object tracking for interactive television

Trichet, Rémi

This thesis takes place in the framework of the European project porTiVity. This project aims to develop services for interactive television through the development of an end-to-end platform. This system requires an annotation video tool allowing the video producer to select and automatically track video objects which can be enriched with some additional content for further access by the end-user. This tool entails the treatment of all type of video and so the creation of generic object tracking algorithms able to deal with all possible difficulties. The keypoints turned to be the more adapted features to this task. A tracking system is usually articulated in four parts:
1.      Extraction of reliable features describing the object.
2.      Matching of these features with those of the previous frame.
3.      Object motion estimation according to the matched features.
4.      Model update.
Our tracking system is not an exception to the rule and the contribution of this thesis can be presented as the participation to the state of the art of these four parts through the development of the object model, the matching algorithm, and the motion assessment techniques.
        The object is modeled with a cloud of Harris keypoints. Each keypoint is described with a set of 18 mathematical moments computed on its vicinity. These features, originally developed for static images are commonly adapted to video. This thesis highlights the weakness of such an approach and the increasing need of descriptors dedicated to video. Indeed, Keypoints turned to be temporally instable and sensitive to image contrast. Hence, we have developed a pretreatment leading to a better exploitation of the color channels. Moreover, our model keeps the keypoints during n frames therefore limiting their instability in time, and introduces a utilization delay in order to prevent, in some extend, occlusions. We are also using a keypoint labeling algorithm differentiating object from background keypoints in order to limit the influence of the latter on the tracking. Finally, the "Fast Harris" system, speed up the keypoint extraction process by using a structure similar to the Haar wavelets.
        We have also developed a keypoint probabilistic matching algorithm. This matching algorithm jointly uses descriptors and spatial relationships modeled with a Delaunay triangulation to increase the matching efficiency.
        The object motion is represented with six parameters determined by the least squares method: two for the rotation, two for the translation, and two for the scale changes. A bounding box repositioning algorithm using the keypoint labels as well as an adaptation of the model motion according to the background clutter are further perfecting this first estimation of the object supposed position.
        The compared performances of our approach with references algorithms in terms of tracking accuracy shows a large predominance of the latter on a set of various video.

Data Science
Eurecom Ref:
© TELECOM ParisTech. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/2708