We update the dense trajectories code with OpenCV-2.4.2 and ffmpeg-0.11.1. It is much easier to compile now! You can download the latest version and compile it under Linux.
compare different trajectories download |
compare different descriptors download |
Before using the code, make sure that you have opencv and ffmpeg installed correctly in your system. Currently, the libraries are the latest versions. In case they will be out of date, they are also available here: OpenCV-2.4.2 and ffmpeg-0.11.1.
Once you install opencv and ffmpeg properly, you should be able to compile the code immediately. For more instructions, please refer to the README file. Please note also that our code is mentioned only for scientific or personal use. If you have problems running the code, please check the FAQ below before sending ME an e-mail.
September 2013: Third version release. Much cleaner code, improved speed (2x as before), add 3 more dimensions for spatio-temporal pyramid.
March 2013: The journal version of our paper is published on-line, where we provide an extensive evaluation on nine action datasets.
September 2012: Second version release. With the latest libraries and bugs fixed.
May 2011: First version release.
Output the help information with -h:
Usage: DenseTrack video_file [options] Options: -h Display this message and exit -S [start frame] The start frame to compute feature (default: S=0 frame) -E [end frame] The end frame for feature computing (default: E=last frame) -L [trajectory length] The length of the trajectory (default: L=15 frames) -W [sampling stride] The stride for dense sampling feature points (default: W=5 pixels) -N [neighborhood size] The neighborhood size for computing the descriptor (default: N=32 pixels) -s [spatial cells] The number of cells in the nxy axis (default: nxy=2 cells) -t [temporal cells] The number of cells in the nt axis (default: nt=3 cells)
DenseTrack myvideo.vob [options] | gzip > myfeatures.gz
If there are no option, the features will be computed using the default parameters.
The features are computed one by one, and each one in a single line, with the following format:
frameNum mean_x mean_y var_x var_y length scale x_pos y_pos t_pos Trajectory HOG HOF MBHx MBHy
The first 10 elements are information about the trajectory:
frameNum: The trajectory ends on which frame mean_x: The mean value of the x coordinates of the trajectory mean_y: The mean value of the y coordinates of the trajectory var_x: The variance of the x coordinates of the trajectory var_y: The variance of the y coordinates of the trajectory length: The length of the trajectory scale: The trajectory is computed on which scale x_pos: The normalized x position w.r.t. the video (0~0.999), for spatio-temporal pyramid y_pos: The normalized y position w.r.t. the video (0~0.999), for spatio-temporal pyramid t_pos: The normalized t position w.r.t. the video (0~0.999), for spatio-temporal pyramid
The following element are five descriptors concatenated one by one:
Trajectory: 2x[trajectory length] (default 30 dimension) HOG: 8x[spatial cells]x[spatial cells]x[temporal cells] (default 96 dimension) HOF: 9x[spatial cells]x[spatial cells]x[temporal cells] (default 108 dimension) MBHx: 8x[spatial cells]x[spatial cells]x[temporal cells] (default 96 dimension) MBHy: 8x[spatial cells]x[spatial cells]x[temporal cells] (default 96 dimension)
We update the code with a new version of opencv and ffmpeg. It's now much easier to compile. Please check the README file for instructions about how to compile.
There is an option in the DenseTrack.cpp file:
int show_track = 0; // set show_track = 1, if you want to visualize the trajectories
You need to use a larger "sampling stride", e.g., 10 pixels or higher, otherwise it would be too dense to see anything.
The length of the trajectory in the output file is the total displacements in the image plane, not the temporal length.
In the DenseTrack.cpp file:
std::vectortrajectory(trackInfo.length+1); for(int i = 0; i <= trackInfo.length; ++i) trajectory[i] = iTrack->point[i]*fscales[iScale];
The coordinates of a trajectory is saved in "std::vector<Point2f> trajectory". Here all the coordinates are mapped to the original resolution. You can output the trajectory coordinates right after that.
Sorry, we don't have it at the moment.
No. But since we provide C++ source code, it should be easy to run it in windows.
Try to visualize the video and check whether the video is decoded properly. Or you can also convert it to raw video.
Actually, you don't need to save all the features to the disk. You can pipe the raw features directly to your quantizer, and only save the bag-of-features histograms. For training the codebook, you only need to save a small proportion of features from the training set.
I don't save the features to the disk as they take lots of space, and sending large amount of data takes lots of time. Please try to compute the features yourself as the code is easy to compile and fast to run.
That's due to our dense sampling strategy. In the first frame, we sample feature points in all the positions, so many trajectories finish at frame 15. In the following frames, we only sample new feature points to replace the missing ones.
We are using an early version of the dataset. It is available here.
Please cite our paper if you use the code.
@inproceedings{wang:2011:inria-00583818:1, AUTHOR = {Heng Wang and Alexander Kl{\"a}ser and Cordelia Schmid and Cheng-Lin Liu}, TITLE = {{Action Recognition by Dense Trajectories}}, BOOKTITLE = {IEEE Conference on Computer Vision \& Pattern Recognition}, YEAR = {2011}, MONTH = Jun, PAGES = {3169-3176}, ADDRESS = {Colorado Springs, United States}, URL = {http://hal.inria.fr/inria-00583818/en} }