Datasets - Div150Adhoc

Datasets: Div150Adhoc 2016

A Social Image Retrieval Result Diversification Dataset with Adhoc Multi-topic Queries

This dataset is designed to support research in the areas of information retrieval that foster new technologies for improving both the relevance and the diversification of search results with explicit focus on the social media context. The dataset consists of Creative Commons data for a development set containing 70 queries (20,757 Flickr photos - including 35 multi-topic queries related to events and states associated with locations), a user annotation credibility set containing information for ca. 300 location-based queries and 685 users, a set providing semantic vectors for general English terms computed on top of the English Wikipedia, and a test set containing 65 queries (19,017 Flickr photos).

The dataset was validated during the 2016 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

Resources:

Using the dataset:

If you plan to make use of the Div150Adhoc dataset, or refer to its results, please acknowledge the work of the authors by citing the following paper:

B. Ionescu, A.L. Gînscă, M. Zaharieva, B. Boteanu, M. Lupu, H. Müller, "Retrieving Diverse Social Images at MediaEval 2016: Challenge, Dataset and Evaluation", MediaEval Benchmarking Initiative for Multimedia Evaluation, vol. 1739, CEUR-WS.org, ISSN: 1613-0073, Hilversum, Netherlands, October 20-21, 2016.

Download the dataset:

You can download the data from 153.109.124.90 using a FTP client at your convenience (e.g., Filezilla, WinSCP, etc.). Configure your FPT client to connect to:

host: 153.109.124.90
port: 21
username: diversity16
password: 20diversitytask!6.

Acknowledgements:

Bogdan Ionescu, LAPI, University "Politehnica" of Bucharest, Romania; Alexandru Lucian Gînscă, CEA LIST, France; Maia Zaharieva, University of Vienna & Vienna University of Technology, Austria; Bogdan Boteanu, LAPI, University "Politehnica" of Bucharest, Romania; Mihai Lupu, Vienna University of Technology, Austria; Henning Müller, University of Applied Sciences Western Switzerland (HES-SO) in Sierre, Switzerland.

This dataset was partially supported by the Vienna Science and Technology Fund (WWTF) through project ICT12-010.

Many thanks to Gabi Constantin, Lukas Diem, Ivan Eggel, Laura Fluerătoru, Ciprian Ionașcu, Corina Macovei, Cătălin Mitrea, Irina Emilia Nicolae, Mihai Gabriel Petrescu, Andrei Purică for their help.