A Multifaceted Movie Trailer Dataset for Recommendation and Retrieval

This dataset is intended to be used for assessing the quality of methods for automatic prediction of movie recommendations from their content. It consits of 7k clips for 800 unique movies. Development set (devset) provides features computed from 5,562 clips corresponding to 632 unique movies, while the testset provides features for 1,315 clips corresponding to 159 unique movies from the well-known MovieLens 20M dataset. It makes use of the user ratings from the MovieLens dataset to calculate the grountruth, namely the per-movie global average rating and rating variance. The YouTube IDs of the clips are also available in the movie names of the clips. Each movie has on average about 8.5 associated clips where this value is calculated over both the devset and testset.

The dataset was validated during the 2018 Recommending Movies Using Content: Which content is key? task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

For more details see:
  1. Y. Deldjoo, M.G. Constantin, A. Dritsas, B. Ionescu, M. Schedl, “The MediaEval 2018 Movie Recommendation Task: Recommending Movies Using Content”, MediaEval Benchmarking Initiative for Multimedia Evaluation, vol. 2283, CEUR-WS.org, ISSN: 1613-0073, 2018 (task overview paper describing the dataset and the task).
  2. Y. Deldjoo, M.G. Constantin, M. Schedl, B. Ionescu, P. Cremonesi, “MMTF-14K: A Multifaceted Movie Trailer Dataset for Recommendation and Retrieval”, ACM Multimedia Systems - MMSys, June 12-15, Amsterdam, Netherlands, 2018 (download draft paper, link to ACM).

If you plan to make use of the MMTF-14K dataset, or refer to its results, please acknowledge the work of the authors by citing the papers listed above.