Recognizing Activities with Cluster-Trees of Tracklets

Adrien Gaidon, Zaid Harchaoui, Cordelia Schmid, BMVC 2012

Recognizing complex activities


Supervised activity classification in videos

Activity: complex actions characterized by spatio-temporal relations between a variable number of parts

Goal: automatically identify motion components and exploit both their contents and their relations to improve recognition

Proposed approach

  1. Describe motion content using dense tracklets: fixed short duration point trajectories [Wang CVPR'11]
  2. Hierarchically decompose the set of tracklets of a video using divisive spectral clustering
  3. SVM on tree-structured activity models using a hierarchical kernel on nested histograms

Extracting motion information

Video data

Dense Tracklets

Camera Motion Compensation

Tracklets on stabilized video

Structuring motion information

Hierarchical motion decomposition

Greedy top-down bi-partitioning of the set of tracklets

Example leaf labels: they are not enough (oversegmentation)

Hierarchical spectral divisive clustering

Hierarchically cluster tracklets using:

Tree-structured activity models

BOF-Tree activity model

nested histograms of motion features

Kernel on BOF-Trees
approximation of all pairwise sub-tree comparisons (uses tree structure and node content)



High Five [Patron-Perez BMVC'10] (4 human interaction categories, 300 TV-show videos)

Olympic Sports [Niebles ECCV'10] (16 sport activity categories, 783 YouTube videos)


On the High Five dataset

On the Olympic Sports dataset

© 2008-2012   Adrien GAIDON