A Social Image Retrieval Result Diversification Dataset with Adhoc Queries and Multiple Expert Annotations

This dataset is designed to support research in the areas of information retrieval that foster new technologies for improving both the relevance and the diversification of search results with explicit focus on the social media context. The dataset consists of redistributable Creative Commons licensed information about general-purpose, multi-topic queries. Each query will be represented with up to 300 Flickr photos and their associated social metadata (e.g., title, description, geo-tagging information, number of views, and number of posted comments). The data is partitioned as following: (1) development data intended for designing and training the approaches (ca. 100 general-purpose, multi-concept queries with 30,000 images); (2) credibility data intended to estimate the global quality of tag-image content relationships for a user's contribution (metadata for ca. 3,000 users); (3) evaluation data intended for the actual benchmark (ca. 100 general-purpose, multi-concept queries with 30,000 images).

The dataset was validated during the 2017 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

Using the dataset:

If you plan to make use of the Div150Multidiv dataset, or refer to its results, please acknowledge the work of the authors by citing the following papers:

  1. M. Zaharieva, B. Ionescu, A.L. Gînscă, R.L.T. Santos, H. Müller, "Retrieving Diverse Social Images at MediaEval 2017: Challenge, Dataset and Evaluation", MediaEval Benchmarking Initiative for Multimedia Evaluation, vol. 1984, CEUR-WS.org, ISSN: 1613-0073, Dublin, Ireland, September 13-15, 2017.
  2. M. Rohm, B. Ionescu, A.L. Gînscă, R.L.T. Santos, H. Müller, "SubDiv17: A Dataset for Investigating Subjectivity in the Visual Diversification of Image Search Results", ACM Multimedia Systems - MMSys, June 12-15, Amsterdam, Netherlands, 2018 (download draft paper, link to ACM).

Maia Zaharieva, Vienna University of Technology, Austria; Bogdan Ionescu, University "Politehnica" of Bucharest, Romania; Alexandru Lucian Gînscă, CEA LIST, France; Rodrygo L.T. Santos, Universidade Federal de Minas Gerais (UFMG), Brazil; Henning Müller, University of Applied Sciences Western Switzerland (HES-SO) in Sierre, Switzerland. Many thanks to Bogdan Boteanu and Mihai Lupu for their help.