Compact video description with precise temporal alignment
European Conference on Computer Vision - sep 2010
Download the publication :
This paper introduces a very compact yet discriminative video
description, which allows example-based search in a large
number of frames corresponding to thousands of hours of video. Our
description extracts one descriptor per indexed video frame by
aggregating a set of local descriptors. These frame descriptors
are encoded using a time-aware hierarchical indexing
structure. A modified temporal Hough voting scheme is used to rank the
retrieved database videos and estimate segments in them that match the query. If we use a dense temporal
description of the videos, matched video segments are localized with
excellent precision.
Experimental results on the Trecvid 2008 copy detection task and a
set of 38000 videos from YouTube show that our method offers an
excellent trade-off between search accuracy, efficiency and memory
usage.
Images and movies
See also
This video search method has lower recall than that of our
Transactions on multimedia article, but requires less ressources (memory+CPU). Moreover, to our knowledge it outperforms all existing methods in terms of temporal localization.
The core image representation used in this paper was introduced in our
CVPR 10 contribution.
BibTex references
@InProceedings{DJSP10,
author = "Matthijs Douze and Herv\'e J\'egou and Cordelia Schmid and Patrick P\'erez",
title = "Compact video description with precise temporal alignment",
booktitle = "European Conference on Computer Vision",
pages = "522--535",
month = "sep",
year = "2010",
url = "http://lear.inrialpes.fr/pubs/2010/DJSP10"
}
Other publications by...