Multimodal video copy detection

Dr. Xavier Anguera Miro - Telefoncia Research, Spain
Multimedia Communications

Date: Thu, 05/03/2012 - 09:30 - Thu, 05/03/2012 - 10:30
Location: Eurecom

Finding copies of video segments within a given video corpus is a problem that has been extensively studied in the last few years, both in research and commercially. In research, NIST Trecvid evaluations have pushed its development from 2007-2011, lately converting it into a multimodal task, asking participants to use both audio and visual information. In the commercial world we see many examples, some monomodal ones (e.g. shazam, into_now, gracenote) and some -- supposedly -- multimodal (e.g. Youtube). At Telefonica Research we have developed a flexible video-copy-detection system to work both on audio, video or multimodal data. For this reason we build strong monomodal detectors and combined them using a late fusion strategy that has proven effective in reducing errors when combining any set of system outputs available. In this talk I will describe the monomial systems, how we perform the fusion step and I will also talk about scalability issues in order to go from a research to a commercial implementation.