TR2016-138

Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering


    •  Tawara, N., Ogawa, T., Watanabe, S., Kobayashi, T., "Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering", APSIPA Transactions on Signal and Information Processing, DOI: 10.1017/​ATSIP.2016.15, Vol. 5, October 2016.
      BibTeX TR2016-138 PDF
      • @article{Tawara2016oct,
      • author = {Tawara, Naohiro and Ogawa, Tetsuji and Watanabe, Shinji and Kobayashi, Tetsunori},
      • title = {Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering},
      • journal = {APSIPA Transactions on Signal and Information Processing},
      • year = 2016,
      • volume = 5,
      • month = oct,
      • doi = {10.1017/ATSIP.2016.15},
      • url = {https://www.merl.com/publications/TR2016-138}
      • }
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

This paper proposes a novel model estimation method, which uses nested Gibbs sampling to develop a mixtureof-mixture model to represent the distribution of the models components with a mixture model. This model is suitable for analyzing multilevel data comprising frame-wise observations, such as videos and acoustic signals, which are composed of frame-wise observations. Deterministic procedures, such as the expectation maximization algorithm have been employed to estimate these kinds of models, but this approach often suffers from a large bias when the amount of data is limited. To avoid this problem, we introduce a Markov chain Monte Carlo-based model estimation method. In particular, we aim to identify a suitable sampling method for the mixture-of-mixture models. Gibbs sampling is a possible approach, but this can easily lead to the local optimum problem when each component is represented by a multi-modal distribution. Thus, we propose a novel Gibbs sampling method, called nested Gibbs sampling, which represents the lower-level (fine) data structure based on elemental mixture distributions and the higher-level (coarse) data structure based on mixture of-mixture distributions. We applied this method to a speaker clustering problem and conducted experiments under various conditions. The results demonstrated that the proposed method outperformed conventional sampling-based, variational Bayesian, and hierarchical agglomerative methods.