A Social Image Retrieval Result Diversification Dataset with User Tagging Credibility Estimation

This dataset is designed to support research in the areas of information retrieval that foster new technologies for improving both the relevance and the diversification of search results with explicit focus on the social media context. The dataset consists of Creative Commons data of 300 landmark locations represented via 45,375 Flickr photos, 16M photo links for around 3.000 users, metadata, Wikipedia pages and content descriptors for text and visual modalities. Data is annotated for the relevance and the diversity of the photos. The dataset includes also information about user annotation credibility. Credibility is determined as an automatic estimation of the quality (correctness) of a particular user's tags.

The dataset was validated during the 2014 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

Using the dataset:

If you plan to make use of the Div150Multi dataset, or refer to its results, please acknowledge the work of the authors by citing the following papers:

  1. B. Ionescu, A. Popescu, M. Lupu, A.L. Gînscă, B. Boteanu, H. Müller, “Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset”, ACM Multimedia Systems - MMSys2015, 18-20 March, Portland, Oregon, USA, 2015 (6 pages, download draft PDF).
  2. B. Ionescu, A. Popescu, M. Lupu, A.L. Gînscă, H. Müller, “Retrieving Diverse Social Images at MediaEval 2014: Challenge, Dataset and Evaluation”, MediaEval Benchmarking Initiative for Multimedia Evaluation, vol. 1263, CEUR-WS.org, ISSN: 1613-0073, October 16-17, Barcelona, Spain, 2014.
Download the dataset:

You can download the data from the ACM MMSys conference repository available here (look for the Div150Cred dataset): http://traces.cs.umass.edu/index.php/mmsys/mmsys.

Acknowledgements:

Bogdan Ionescu, LAPI, University "Politehnica" of Bucharest, Romania (bionescu at alpha.imag.pub.ro); Adrian Popescu, CEA LIST, France (adrian.popescu at cea.fr); Mihai Lupu, Vienna University of Technology, Austria (lupu at ifs.tuwien.ac.at), Henning Müller, University of Applied Sciences Western Switzerland (HES-SO) in Sierre, Switzerland (henning.mueller at hevs.ch).

This dataset was supported by the following projects: MUCKE, CUbRIK and PROMISE.

Many thanks to Alexandru Lucian Gînscă, Adrian Iftene, Bogdan Boteanu, Ioan Chera, Ionuț Duță, Andrei Filip, Corina Macovei, Cătălin Mitrea, Ionuț Mironică, Irina Emilia Nicolae, Ivan Eggel, Andrei Purică, Mihai Pușcaș, Oana Pleș, Gabriel Petrescu, Anca Livia Radu, Vlad Ruxandu for their precious help.