The Labeled Yahoo! News data set
Here is the matlab file containing the data set and features that we
used in
ECCV'10
paper and our submitted IJCV paper.
I will put the images and captions online soon, but if you need them
quickly, you can try to send me an email.
Or download the images and caption from
Tamara Berg's website.
Labeled Faces in the Wild
The SIFT-based features we used for our submission (cf our
ICCV'09 paper) to the
Labeled
Faces in the
Wild data set are
now available. The archive below includes facial feature localisations
(.pts files) and the descriptors (.jeval files). They are text
files.
INRIA features for image annotation and classification data sets
Features used in our
ICCV'09 and
CVPR'10 papers
(please cite one if you use them)
on image auto-annotation, keyword-based image retrieval and multimodal
semi-supervised learning. In case of
error or question, please contact me.
News and updates
- 2010/06/09: added PASCAL VOC 07 and MIR Flickr data sets.
- 2009/11/11: packages have been updated.
- Archives on this website contain:
- A dictionary: the list of words used for annotating the images.
- Lists of files of train and test sets. You will have to figure
out the correspondence of these lists with the set of images you
collected, but this shouldn't be too hard.
- Annotation files for the train/test sets: matrix containing
binary values to encode the annotations. Columns correspond to words in
the dictionary, rows correspond to images as in the lists
- 15 Descriptor files for the train/test sets: matrices containing
visual descriptors for the images: Gist, DenseSift,
DenseSiftV3H1, HarrisSift, HarrisSiftV3H1,
DenseHue, DenseHueV3H1, HarrisHue,
HarrisHueV3H1, Rgb, RgbV3H1, Lab,
LabV3H1, Hsv, HsvV3H1
- Matlab and C code to read/write the binary files.
- See below for an example matlab script.
COREL 5K
IAPR TC-12
ESP GAME
PASCAL VOC 2007
- Images available on the PASCAL VOC 2007 website.
- We downloaded Flickr tags (using the Flickr ID) for the images that were still online at the time.
- Archive containing our features and downloaded tags for PASCAL VOC 07: pascal07.20100609.tar.bz2 (96 Mb)
MIR FLICKR
Example script
Script in matlab that shows you how you can load the data
and display images. This script displays some example images with
annotations for the train and test sets for each data set, assuming you
have extracted all the files below in a given directory, and that you
have the images somewhere that correspond to the lists of images.
datasets = { 'corel5k', 'iaprtc12', 'espgame' };
sets = { 'test', 'train' };
impaths = { 'path-to-corel5k-images/', 'path-to-iaprtc12-images/', 'path-to-espgame-images/' };
T=10;
for db=1:length(datasets),
ds = datasets{db};
dict = textread([ds '_dictionary.txt'],'%s');
impath = impaths{db};
for s=1:length(sets),
str = sets{s};
list = textread([ds '_' str '_list.txt'],'%s');
annot = logical(vec_read([ds '_' str '_annot.hvecs']));
u = randperm(length(list));
for i=1:T,
n = u(i);
imshow([impath list{n} '.jpg']);
words = dict(annot(n,:));
text(0,-10,[ sprintf('%d/%d: ',i,T) sprintf(' %s',(words{:}))]);
pause(2)
end
end
end