A Social Image Retrieval Result Diversification Dataset

This dataset is designed to support research in the areas of information retrieval that foster new technologies for improving both the relevance and the diversification of search results with explicit focus on the social media context. The dataset consists of Creative Commons data related to 396 landmark locations and contains 43,418 Flickr photos together with their Wikipedia and Flickr metadata and some content descriptor information (visual and text). Data is annotated for the relevance and the diversity of the photos (both expert and crowd annotations are provided).

The dataset was validated during the 2013 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

Using the dataset:

If you plan to make use of the Div400 dataset, or refer to its results, please acknowledge the work of the authors by citing the following papers:

  1. B. Ionescu, A.-L. Radu, M. Menéndez, H. Müller, A. Popescu, B. Loni, “Div400: A Social Image Retrieval Result Diversification Dataset”, ACM Multimedia Systems - MMSys2014, 19-21 March, Singapore, 2014 (download draft PDF).
  2. B. Ionescu, M. Menéndez, H. Müller, A. Popescu, “Retrieving Diverse Social Images at MediaEval 2013: Objectives, Dataset and Evaluation”, MediaEval Benchmarking Initiative for Multimedia Evaluation, vol. 1043, CEUR-WS.org, ISSN: 1613-0073, October 18-19, Barcelona, Spain, 2013.
Download the dataset:

You can download the data from the ACM MMSys conference repository available here (look for the Div400 dataset): http://traces.cs.umass.edu/index.php/mmsys/mmsys.


Bogdan Ionescu, LAPI, University "Politehnica" of Bucharest, Romania (bionescu at alpha.imag.pub.ro); María Menéndez, KnowDive, Department of Information Engineering and Computer Science, University of Trento, Italy (menendez at disi.unitn.it); Henning Müller, University of Applied Sciences Western Switzerland (HES-SO) in Sierre, Switzerland (henning.mueller at hevs.ch); Adrian Popescu, CEA LIST, France (adrian.popescu at cea.fr).

This dataset was supported by the following projects: EXCEL POSDRU, CUbRIK, PROMISE and MUCKE.

Many thanks to Anca-Livia Radu, Bogdan Boteanu, Ivan Eggel, Sajan Raj Ojha, Ionuț Mironică, Ionuț Dută, Oana Pleș, Andrei Purică, Macovei Corina and Irina Nicolae for their precious help.