"Web Queries" dataset description

The "Web Queries" dataset contains 71478 images and meta-data retrieved by 353 web queries.
For each retrieved image the relevance label is available. The relavance labels are obtained by manual labeling.
French query words were used to retrieve the images, but we provide also the English translation.

Download the dataset

File contents description

filename description column format
queries_fr_orig.txt list of concepts in French query_id original_query_string
queries_en_trans.txt list of concepts translated to English query_id translated_query_string
labels.txt list relevance labels for documents retrieved by search engine query_id document_id relevance_label

Relevance labels are binary: 1 for "relevant to the query", 0 for "irrelevant to the query"

Directory contents description

directory name description filename format
images directory containing images query_[query_id]_document_[document_id]_imagethumb.jpg
metadata directory containing text-meta data query_[query_id]_document_[document_id]_textmeta.xml

The description of XML tags in text metadata file

Text metadata is in XML format. The root tag is: <documentmeta>

tag name description tag value(s)
concept text query used to retrieve the image string
rank search engine's rank of the image unsigned int
language the language of the web-page in which contains the image en, fr, unknown
refererthe URL of the web-page which contains the image url
imageUrl the URL of the image file url
before up to 10 words before <img> HTML tag in web-page up to 10 strings
after up to 10 words after <img> HTML tag in web-page up to 10 strings
ptitle the <title> HTML tag of the web-page (the title of the web-page) N strings
alt the "alt" attribute of <img> HTML tag in web-page (image alternative description) N strings

Example 1

line in labels.txt: 150 0 1

query_id: 150
document_id: 0
relevance_label: 1

original_query_string: "trompette"
translated_query_string: "trumpet"

metadata file metadata/query_150_document_0_textmeta.xml <?xml version="1.0" encoding="utf-8"?>
  <concept> trompette </concept>
  <rank> 1 </rank>
  <language> en </language>
  <referer> http://allthingschill.com/wordpress/archives/2004/11/wallpapers/ </referer>
  <imageUrl> http://allthingschill.com/img/wallpaper/trompette2.jpg </imageUrl>
  <before> 1024 x 768 Au Revoir Trompette </before>
  <after> </after>
  <ptitle> All Things Chill » Blog Archive » Wallpapers </ptitle>
  <alt> Trompette </alt>
thumbnail file images/query_150_document_0_imagethumb.jpg

Example 2

line in labels.txt: 150 1 0

query_id: 150
document_id: 1
relevance_label: 0

original_query_string: "trompette"
translated_query_string: "trumpet"

metadata file metadata/query_150_document_1_textmeta.xml <?xml version="1.0" encoding="utf-8"?>
  <concept> trompette </concept>
  <rank> 2 </rank>
  <language> en </language>
  <referer> http://www.fortified-places.com/reliefs/trompette.html </referer>
  <imageUrl> http://www.fortified-places.com/reliefs/images/trompette_thumb3.jpg </imageUrl>
  <before> the Fronde a rebellion of nobles against Louis XIV </before>
  <after> He decided to rebuild the Château Trompette as a </after>
  <ptitle> Fortified Places > Relief Maps > Château Trompette </ptitle>
  <alt> Château Trompette. </alt>
thumbnail file images/query_150_document_1_imagethumb.jpg