TR2005-077

Modelling Sports Highlights Using a Time Series Clustering Framework and Model Interpretation


    •  Radhakrishnan, R., Otsuka, I., Xiong, Z., Divakaran, A., "Modelling Sports Highlights Using a Time Series Clustering Framework and Model Interpretation", SPIE Conference on Storage and Retrieval Methods and Applications for Multimedia, January 2005, vol. 5682, pp. 269-276.
      BibTeX TR2005-077 PDF
      • @inproceedings{Radhakrishnan2005jan,
      • author = {Radhakrishnan, R. and Otsuka, I. and Xiong, Z. and Divakaran, A.},
      • title = {Modelling Sports Highlights Using a Time Series Clustering Framework and Model Interpretation},
      • booktitle = {SPIE Conference on Storage and Retrieval Methods and Applications for Multimedia},
      • year = 2005,
      • volume = 5682,
      • pages = {269--276},
      • month = jan,
      • url = {https://www.merl.com/publications/TR2005-077}
      • }
Abstract:

In our past work on sports highlights extraction, we have shown the utility of detecting audience reaction using an audio classification framework. The audio classes in the framework were chosen based on intuition. In this paper, we present a systematic way of identifying the key audio classes for sports highlights extraciton using a time series clustering framework. We treat the low-level audio features as a time series and model the highlight segments as \"unusual\" events in a background of an \"usual\" process. The set of audio classes to characterize the sports domain is then identified by analyzing the consistent patterns in each of the clusters output from the time series clustering framework. The distribution of features from the training data so obtained for each of the key audio classes, is parameterized by a Minimum Description Length Gaussian Mixture Model (MDL-GMM). We also interpret the meaning of each of the mixture components of the MDL-GMM for the key audio class (the \"highlight\" class) that is correlated with highlight moments. Our results show that the \"highlight\" class is a mixture of audieance cheering and commentator\'s excited speech. Furthermore, we show that the precision-recall performance for highlights extraction based on this \"highlight\" class is better than that of our previous approach which uses only audience cheering as the key highlight class.

 

  • Related News & Events