A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries

This dataset is designed to support research in the areas of information retrieval that foster new technologies for improving both the relevance and the diversification of search results with explicit focus on the social media context. The dataset consists of Creative Commons data for around 153 one-concept Flickr queries and 45,375 images for development and 139 Flickr queries (69 one-concept - 70 multi-concept) and 41,394 images for testing; metadata, Wikipedia pages and content descriptors for text and visual modalities. Data is annotated for the relevance and the diversity of the photos. An additional dataset used to train the credibility descriptors (an automatic estimation of the quality (correctness) of a particular user's tags) provides information for ca. 685 Flickr users and metadata for more than 3.5M images.

The dataset was validated during the 2015 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

Using the dataset:

If you plan to make use of the Div150Multi dataset, or refer to its results, please acknowledge the work of the authors by citing the following papers:

  1. B. Ionescu, A.L. Gînscă, B. Boteanu, M. Lupu, A. Popescu, H. Müller, “Div150Multi: A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries”, ACM Multimedia Systems - MMSys2016, 10-13 May, Klagenfurt, Austria, 2016 (6 pages, download draft PDF).
  2. B. Ionescu, A.L. Gînscă, B. Boteanu, A. Popescu, M. Lupu, H. Müller, "Retrieving Diverse Social Images at MediaEval 2015: Challenge, Dataset and Evaluation", MediaEval Benchmarking Initiative for Multimedia Evaluation, vol. 1436, CEUR-WS.org, ISSN: 1613-0073, Wurzen, Germany, September 14-15, 2015.
Download the dataset:

You can download the data from the ACM MMSys conference repository available here (look for the Div150Multi dataset): http://traces.cs.umass.edu/index.php/mmsys/mmsys.

Acknowledgements:

Bogdan Ionescu, LAPI, University "Politehnica" of Bucharest, Romania (bionescu at alpha.imag.pub.ro), Alexandru Lucian Gînscă, CEA LIST, France (alexandru.ginsca at cea.fr), Bogdan Boteanu, LAPI, University "Politehnica" of Bucharest, Romania (bboteanu at alpha.imag.pub.ro), Adrian Popescu, CEA LIST, France (adrian.popescu at cea.fr), Mihai Lupu, Vienna University of Technology, Austria (lupu at ifs.tuwien.ac.at), Henning Müller, University of Applied Sciences Western Switzerland (HES-SO) in Sierre, Switzerland (henning.mueller at hevs.ch).

This dataset was partially supported by project MUCKE.

Many thanks to Ioan Chera, Ionuț Duță, Andrei Filip, Florin Guga, Tiberiu Loncea, Corina Macovei, Cătălin Mitrea, Ionuț Mironică, Irina Emilia Nicolae, Ivan Eggel, Andrei Purică, Mihai Pușcaș, Oana Pleș, Gabriel Petrescu, Anca Livia Radu, Vlad Ruxandu, Gabriel Vasile for their precious help.