Compact video description with precise temporal alignment

European Conference on Computer Vision - sep 2010
Download the publication : douze_eccv10.pdf [1.6Mo]   poster_douze_eccv10.pdf [18Mo]  
This paper introduces a very compact yet discriminative video description, which allows example-based search in a large number of frames corresponding to thousands of hours of video. Our description extracts one descriptor per indexed video frame by aggregating a set of local descriptors. These frame descriptors are encoded using a time-aware hierarchical indexing structure. A modified temporal Hough voting scheme is used to rank the retrieved database videos and estimate segments in them that match the query. If we use a dense temporal description of the videos, matched video segments are localized with excellent precision. Experimental results on the Trecvid 2008 copy detection task and a set of 38000 videos from YouTube show that our method offers an excellent trade-off between search accuracy, efficiency and memory usage.

Images and movies

 

See also

This video search method has lower recall than that of our Transactions on multimedia article, but requires less ressources (memory+CPU). Moreover, to our knowledge it outperforms all existing methods in terms of temporal localization. The core image representation used in this paper was introduced in our CVPR 10 contribution.

BibTex references

@InProceedings{DJSP10,
  author       = "Matthijs Douze and Herv\'e J\'egou and Cordelia Schmid and Patrick P\'erez",
  title        = "Compact video description with precise temporal alignment",
  booktitle    = "European Conference on Computer Vision",
  pages        = "522--535",
  month        = "sep",
  year         = "2010",
  url          = "http://lear.inrialpes.fr/pubs/2010/DJSP10"
}

Other publications by...