Information Fusion for Social Image Retrieval and Diversification

The dataset is intended to be used for assessing the quality of late fusion methods. The use case scenario is for systems allowing the diversification of image search results in the context of social media. The dataset consist of a set of 672 queries and 240 diversification system outputs and is structured as following, according to the development/validation/testing procedure: devset (development data) - contains two data sets, i.e., devset1 (with 346 queries and 39 system outputs) and devset2 (with 60 queries and 56 system outputs); validset (validation data) - contains 139 queries with 60 system outputs; testset (testing data) - contains two data sets, i.e., seenIR data (with 63 queries and 56 system outputs, it contains the same diversification system outputs as in the devset2 data) and unseenIR data (with 64 queries and 29 system outputs, it contains unseen, novel diversification system outputs). All the data consists of redistributable Creative Commons licensed information from Flickr and Wikipedia, as well as content descriptors which are provided on an "as is" basis and were computed according to algorithms from the literature.

The dataset was validated during the 2018 ICPR Multimedia Information Processing for Personality and Social Networks Analysis Challenge at the ChaLearn Looking at People Benchmarking Initiative.

To download the data, please follow this link.

For more details see:
  1. G. Ramirez, E. Villatoro, B. Ionescu, H.J. Escalante, S. Escalera, M. Larson, H. Müller, I. Guyon, “Overview of the Multimedia Information Processing for Personality and Social Networks Analysis Contest”, International Conference on Pattern Recognition - ICPR 2018, Springer LNCS 11188, pp. 127-139, Springer Link, ISBN: 978-3-030-05792-3.
Acknowledgements:

If you plan to make use of the DivFusion dataset, or refer to its results, please acknowledge the work of the authors by citing the paper listed above.

This dataset was conceived using the data gathered during the MediaEval Retrieving Diverse Social Images Tasks. We acknowledge therefore the valuable contribution of the task organizers. We also acknowledge the contribution of our co-organizers of the ICPR 2018 challenge, namely Liviu Ștefan and Andrei Jitaru.