Learning TRECVID'08 High-level Features from YouTube

Adrian Ulges, Markus Koch, Christian Schulze, Thomas Breuel
NIST, 11/2008


We participated in TRECVID's High-level Features task to investigate online video as an alternative data source for concept detector training. Such video material is publicly available in large quantities from video portals like YouTube. In our setup, tags provided by users during upload serve as weak ground truth labels, and training can scale up to thousands of concepts without manual annotation effort. On the downside, online video as a domain is complex, and the labels associated with it are coarse and unreliable, such that performance loss can be expected compared to high-quality standard training sets. To find out if it is possible to train concept detectors on online video, our TRECVID experiments compare the same state-of-the-art (visual only) concept detection systems when (1) training on the standard TRECVID development data and (2) training on clips downloaded from YouTube. Our key observation is that youtube-based detectors work well for some concepts, but are overall significantly outperformed by the ``specialized'' systems trained on standard TRECVID'08 data (giving a infMAP of 2.2% and 2.1% compared to 5.3% and 6.1%). An in-depth analysis of the results shows that a major reason for this seems to be redundancy in the TV08 dataset.




@misc{ ULGE2008,
	Title = {Learning TRECVID'08 High-level Features from YouTube},
	Author = {Adrian Ulges and Markus Koch and Christian Schulze and Thomas Breuel},
	Month = {11},
	Year = {2008},
	Publisher = {NIST},
	Howpublished = {TREC Workshop 2008 on Video Retrieval Evaluation (TRECVID-2008)}

Last modified:: 30.08.2016