Zero-shot classification of events for character-centric video summarization

Reboud, Alison; Harrando, Ismail; Lisena, Pasquale; Troncy, Raphaël
TRECVID 2021, International Workshop on Video Retrieval Evaluation, December 7-10, 2021 (Virtual Conference)

This paper describes an event classification and character-centered approach proposed by the D2KLab team at EURECOM for the 2021 TRECVID Video Summarization Task [Awad et al. 2020]. Our approach relies on defining a list of typical important events in a soap opera and using this list of named events as candidate labels for a zero-shot text classification method. This additional data source is used together with the provided videos, scripts and master shot boundaries. We also use BBC EastEnders characters’ images crawled from the Google search engine in order to train a face recognition system. All our runs use the same general method, but with varying constraints regarding the number of shots and the maximum duration of the summary. The runs submitted are as follows: • EURECOM1: 5 shots with highest similarity scores and the total duration of the summary is < 150 sec; • EURECOM2: 10 shots with highest similarity scores and the total duration of the summary is < 300 sec; • EURECOM3: 15 shots with highest similarity scores and the total duration of the summary is < 450 sec; • EURECOM4: 20 shots with highest similarity scores and the total duration of the summary is < 600 sec. 


HAL
Type:
Conférence
City:
New York
Date:
2021-12-07
Department:
Systèmes de Communication
Eurecom Ref:
6837
Copyright:
© NIST. Personal use of this material is permitted. The definitive version of this paper was published in TRECVID 2021, International Workshop on Video Retrieval Evaluation, December 7-10, 2021 (Virtual Conference) and is available at :

PERMALINK : https://www.eurecom.fr/publication/6837