New THOTH website available here

Dense Trajectories Video Description

We update the dense trajectories code with OpenCV-2.4.2 and ffmpeg-0.11.1. It is much easier to compile now! You can download the latest version and compile it under Linux.

Visualization of trajectories and descriptors

compare different trajectories download

compare different descriptors download

Notes

Before using the code, make sure that you have opencv and ffmpeg installed correctly in your system. Currently, the libraries are the latest versions. In case they will be out of date, they are also available here: OpenCV-2.4.2 and ffmpeg-0.11.1.

Once you install opencv and ffmpeg properly, you should be able to compile the code immediately. For more instructions, please refer to the README file. Please note also that our code is mentioned only for scientific or personal use. If you have problems running the code, please check the FAQ below before sending ME an e-mail.

History

September 2013: Third version release. Much cleaner code, improved speed (2x as before), add 3 more dimensions for spatio-temporal pyramid.

March 2013: The journal version of our paper is published on-line, where we provide an extensive evaluation on nine action datasets.

September 2012: Second version release. With the latest libraries and bugs fixed.

May 2011: First version release.

An Example

Output the help information with -h:

Usage: DenseTrack video_file [options]
Options:
  -h                        Display this message and exit
  -S [start frame]          The start frame to compute feature (default: S=0 frame)
  -E [end frame]            The end frame for feature computing (default: E=last frame)
  -L [trajectory length]    The length of the trajectory (default: L=15 frames)
  -W [sampling stride]      The stride for dense sampling feature points (default: W=5 pixels)
  -N [neighborhood size]    The neighborhood size for computing the descriptor (default: N=32 pixels)
  -s [spatial cells]        The number of cells in the nxy axis (default: nxy=2 cells)
  -t [temporal cells]       The number of cells in the nt axis (default: nt=3 cells)

Compute the features for a video file

DenseTrack myvideo.vob [options] | gzip > myfeatures.gz

If there are no option, the features will be computed using the default parameters.

The format of the computed features

The features are computed one by one, and each one in a single line, with the following format:

frameNum mean_x mean_y var_x var_y length scale x_pos y_pos t_pos Trajectory HOG HOF MBHx MBHy

The first 10 elements are information about the trajectory:

frameNum:     The trajectory ends on which frame
mean_x:       The mean value of the x coordinates of the trajectory
mean_y:       The mean value of the y coordinates of the trajectory
var_x:        The variance of the x coordinates of the trajectory
var_y:        The variance of the y coordinates of the trajectory
length:       The length of the trajectory
scale:        The trajectory is computed on which scale
x_pos:        The normalized x position w.r.t. the video (0~0.999), for spatio-temporal pyramid 
y_pos:        The normalized y position w.r.t. the video (0~0.999), for spatio-temporal pyramid 
t_pos:        The normalized t position w.r.t. the video (0~0.999), for spatio-temporal pyramid

The following element are five descriptors concatenated one by one:

Trajectory:    2x[trajectory length] (default 30 dimension) 
HOG:           8x[spatial cells]x[spatial cells]x[temporal cells] (default 96 dimension)
HOF:           9x[spatial cells]x[spatial cells]x[temporal cells] (default 108 dimension)
MBHx:          8x[spatial cells]x[spatial cells]x[temporal cells] (default 96 dimension)
MBHy:          8x[spatial cells]x[spatial cells]x[temporal cells] (default 96 dimension)

FAQ

The code doesn't compile. I have bugs when compiling it.

We update the code with a new version of opencv and ffmpeg. It's now much easier to compile. Please check the README file for instructions about how to compile.

How can I visualize the dense trajectories?

There is an option in the DenseTrack.cpp file:

int show_track = 0; // set show_track = 1, if you want to visualize the trajectories

You need to use a larger "sampling stride", e.g., 10 pixels or higher, otherwise it would be too dense to see anything.

The length of the trajectory (i.e., the 6th value of the feature output) is not integer (i.e., 15 frames).

The length of the trajectory in the output file is the total displacements in the image plane, not the temporal length.

How can I recover the original trajectory coordinates?

In the DenseTrack.cpp file:

std::vector trajectory(trackInfo.length+1);
for(int i = 0; i <= trackInfo.length; ++i)
    trajectory[i] = iTrack->point[i]*fscales[iScale];

The coordinates of a trajectory is saved in "std::vector<Point2f> trajectory". Here all the coordinates are mapped to the original resolution. You can output the trajectory coordinates right after that.

I can't compile the code. Is there a binary available?

Sorry, we don't have it at the moment.

Is there a windows version?

No. But since we provide C++ source code, it should be easy to run it in windows.

I am using a different version of ffmpeg. I get some warnings when running the code.

Try to visualize the video and check whether the video is decoded properly. Or you can also convert it to raw video.

Your code generates lots of features. I don't have enough space to save them?

Actually, you don't need to save all the features to the disk. You can pipe the raw features directly to your quantizer, and only save the bag-of-features histograms. For training the codebook, you only need to save a small proportion of features from the training set.

Can you send me your features on XXX dataset?

I don't save the features to the disk as they take lots of space, and sending large amount of data takes lots of time. Please try to compute the features yourself as the code is easy to compile and fast to run.

Why your code generates a lot of features at frame 15, and much less features in the following frames?

That's due to our dense sampling strategy. In the first frame, we sample feature points in all the positions, so many trajectories finish at frame 15. In the following frames, we only sample new feature points to replace the missing ones.

The UCF sport dataset available online is different with the one you described in the paper.

We are using an early version of the dataset. It is available here.

Citation

Please cite our paper if you use the code.

@inproceedings{wang:2011:inria-00583818:1,
  AUTHOR = {Heng Wang and Alexander Kl{\"a}ser and Cordelia Schmid and Cheng-Lin Liu},
  TITLE = {{Action Recognition by Dense Trajectories}},
  BOOKTITLE = {IEEE Conference on Computer Vision \& Pattern Recognition},
  YEAR = {2011},
  MONTH = Jun,
  PAGES = {3169-3176},
  ADDRESS = {Colorado Springs, United States},
  URL = {http://hal.inria.fr/inria-00583818/en}
}