15. T. Miller: Building Datasets Using Words and Pictures

There are many sources of challenging and interesting images with associated text, many of which are unlabeled collections. Examples include the Corel dataset, collections of annotated museum material, news articles, web pages or any captioned video. The images and words in these collections are highly correlated. Though the associations between the text and pictures may be quite loose (keywords or associated natural language text) they are still powerful and can be used to classify images that could not be labeled using visual or textual cues alone.

My poster will present some work we have done on automatically building datasets from a few of these, a set of news photographs and a set of web pages. In our work we show that even very simple image discriptors can be quite powerful for these challenging images when used in combination with text cues. Our datasets and experiences give some insight into future object recognition challenges: the data will probably not be perfectly labeled, there will need to be a better understanding of what constitutes a category and there will need to be a better treatment of individual variation within a category.