Video Summarization has become an important tool for Multimedia
Information processing, but the automatic evaluation of a video
summarization system remains a challenge. A major issue is that an
ideal "best" summary does not exist, although people can easily
distinguish "good" from "bad" summaries. A similar situation arise
in machine translation and text summarization, where specific
automatic procedures, respectively BLEU and ROUGE, evaluate the
quality of a candidate by comparing its local similarities with several
human-generated references. These procedures are now routinely
used in various benchmarks. In this paper, we extend this idea to the
video domain and propose the VERT (Video Evaluation by Relevant
Threshold) algorithm to automatically evaluate the quality video
summaries. VERT mimics the theories of BLEU and ROUGE, and
counts the weighted number of overlapping selected units between
the computer-generated video summary and several human-made
references. Several variants of VERT are suggested and compared,
and the best variant is selected through experimentation.