Topic Models for Semantics-preserving Video Compression

Jörn Wanke, Adrian Ulges, Christoph Lampert, Thomas Breuel
Proceedings of the International Conference Multimedia Information Retrieval, Philadelphia, Pennsylvania, USA, ACM, New York, US, 2010


Content-based video understanding tasks such as autoannotation or clustering are based on low-level descriptors of video content, which should be compact in order to optimize storage requirements and efficiency. In this paper, we address the semantic compression of video, i.e. the reduction of low-level descriptors to a few semantically expressive dimensions. To achieve this, topic models have been proposed, which cluster visual content into a low number of latent aspects and have successfully been applied to still images before. In this paper, we investigate topic models for the video domain, addressing several key questions that have been unanswered so far: (1) data: ­ first, we confirm the good performance of topic models for concept detection on web video data, showing that a performance comparable to bag-of-visual-words descriptors can be reached at a compression rate of 1/20. (2) diversity: ­we demonstrate that topic models perform best when trained on large-scale, diverse datasets, i.e. no tedious manual pre-selection is required. (3) multi-modal integration:­ we show how topic models can benefit from an integration of multi-modal features, like motion and patches, and finally (4) temporal structure: ­by extending topic models such that the shot structure of video is taken into account, we show that a better coverage between topics and semantic categories can be achieved.




@inproceedings{ WANK2010,
	Title = {Topic Models for Semantics-preserving Video Compression},
	Author = {Jörn Wanke and Adrian Ulges and Christoph Lampert and Thomas Breuel},
	BookTitle = {Proceedings of the International Conference Multimedia Information Retrieval},
	Year = {2010},
	Publisher = {ACM}

Last modified:: 30.08.2016