Searching with quantization package

We provide a two implementations of the product quantizer search method described in this report.
This technique allows the search in a large vector dataset indexed in a limited amount of memory.



Matlab version

Only the ADC version is given in this package.
This toy implementation is provided for the sake of exposition only. It is significantly slower than the C version.

Download Matlab package
Download Matlab package with ressources: includes matlab ressources and the small evaluation set (also available separately, see below)

This package was written by Herve Jegou and Matthijs Douze in July 2009.
Copyright (C) INRIA 2009.
This version is provided for research purposes only, without any support and with no guarantee.



Pre-compiled library version with sample test programe

The ADC, SDC and IVFADC variants are implemented.
This package is a pre-compiled version of our C library.
It is provided for three architectures: MacOS i386 (32bits), ELF32 (Linux 32bits) and ELF64 (Linux 64 bits)

Architecture Download MD5SUM
MacOS i386 libpq-1.0_MAC32.tar.gz 03d9317beaafcc0aee8f8361e9fc43e6
Linux ELF32 libpq-1.0_ELF32.tar.gz 7c0a41dc5c327e02d167d864d6e429a5
Linux ELF64 libpq-1.0_ELF64.tar.gz 462ec7345887469dc0de80a58b57790e

This package was written by Herve Jegou and Matthijs Douze in July 2009.
Copyright (C) INRIA 2009.
This version is provided for research purposes only, without any support and with no guarantee.
Please contact us if you are interested in a commercial version.


Contact: Herve Jegou and Matthijs Douze

Evaluation datasets for approximate nearest neighbor search

Overview: We provide three evaluation sets: the two datasets used in the evaluation of our Technical report no 7020:

Herve Jegou, Matthijs Douze and Cordelia Schmid
"Searching with quantization: approximate nearest neighbor search using short codes and distance estimators",

and a small dataset used to quickly test the sample programs provided on this web page. All sets are gzipped archives and each of them comprises 4 files of vectors: The vector files are stored in .fvecs format, while the groundtruth file in is .ivecs format.

.fvecs and .ivecs vector file formats: The vectors are stored in raw little endian. Each vector takes 4+d*4 bytes, where d is the dimensionality of the vector, as shown below.

field field type description
d int the vector dimension
components (float | int)*d the vector components

The only difference between .fvecs and .ivecs is the base type for the vector components, which is float or int, respectively. In the matlab section below, we provide two functions to read such files in matlab.

Details and Download

Set Download descriptor type vector dimension nb of base vectors nb of query vectors nb of learn vectors comments
SIFT_small siftsmall.tar.gz (5.1MB) SIFT descriptors 128 10,000 100 25,000 for fast test
SIFT sift.tar.gz (161MB) SIFT descriptors 128 1,000,000 10,000 100,000 -
GIST gist.tar.gz (2.6GB) GIST descriptors 960 1,000,000 1,000 500,000 high memory usage when loaded


The groundtruth files contain, for each query, the identifers (vector number, starting at 0) of its 100 nearest neighbors, ordered by increasing distance. Therefore, the first element of each integer vector is the nearest neighbor identifier associated with the query.

Matlab ressources: I/O, kmeans and exact k-nearest neighbor search

We provide several matlab functions:
Some of them are pre-compiled matlab packages created using MEX-files. They replace several functions that have slow matlab implementations. In particular, the proposed k-means is significantly faster than the one provided in matlab (based on Blas/Lapack and multi-threaded). These mex-files are available for Linux only (both 32 and 64 bits).

Function Implementation Download description
fvecs_read pure matlab fvecs_read.m Read a .fvecs file. Each vector is stored in a column of the output matrix.
ivecs_read pure matlab ivecs_read.m Read a .ivecs file. Each vector is stored in a column of the output matrix.
kmeans_fast MEX-compiled kmeans_fast.tar.gz Fast kmeans with sample program
nn pure matlab and MEX-compiled nn.tar.gz Optimized k-nearest neighbors assignment. This is exhaustive but exact search. The mex version is called when possible.