Multiple Instance Learning on Weakly Labeled Videos

Adrian Ulges, Christian Schulze, Thomas Breuel
Workshop on Cross-Media Information Analysis, Extraction and Management, Koblenz, Germany, Springer, 12/2008


Automatic video tagging systems are targeted at assigning semantic concepts (``tags'') to videos by linking textual descriptions with the audio-visual video content. To train such systems, we investigate online video from portals such as YouTube as a large-scale, freely available knowledge source. Tags provided by video owners serve as weak annotations indicating that a target concept appears in a video, but not when it appears. This situation resembles the multiple instance learning (MIL) scenario, in which classifiers are trained on labeled bags (videos) of unlabeled samples (the frames of a video). We study MIL in quantitative experiments on real-world online videos. Our key findings are: (1) conventional MIL tends to neglect valuable information in the training data and thus performs poorly. (2) By relaxing the MIL assumption, a tagging system can be built that performs comparable or better than its supervised counterpart. (3) Improvements by MIL are minor compared to a kernel-based model we proposed recently.


@inproceedings{ ULGE2008,
	Title = {Multiple Instance Learning on Weakly Labeled Videos},
	Author = {Adrian Ulges and Christian Schulze and Thomas Breuel},
	BookTitle = {Workshop on Cross-Media Information Analysis, Extraction and Management},
	Month = {12},
	Year = {2008},
	Publisher = {Springer},
	Howpublished = {SAMT Workshop on Cross-Media Information Analysis and Retrieval}

Last modified:: 30.08.2016