TR2016-138

Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

- Tawara, N., Ogawa, T., Watanabe, S., Kobayashi, T., "Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering", APSIPA Transactions on Signal and Information Processing, DOI: 10.1017/ATSIP.2016.15, Vol. 5, October 2016.
  BibTeX TR2016-138 PDF
  - @article{Tawara2016oct,
  - author = {Tawara, Naohiro and Ogawa, Tetsuji and Watanabe, Shinji and Kobayashi, Tetsunori},
  - title = {{Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering}},
  - journal = {APSIPA Transactions on Signal and Information Processing},
  - year = 2016,
  - volume = 5,
  - month = oct,
  - doi = {10.1017/ATSIP.2016.15},
  - url = {https://www.merl.com/publications/TR2016-138}
  - }
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

This paper proposes a novel model estimation method, which uses nested Gibbs sampling to develop a mixtureof-mixture model to represent the distribution of the models components with a mixture model. This model is suitable for analyzing multilevel data comprising frame-wise observations, such as videos and acoustic signals, which are composed of frame-wise observations. Deterministic procedures, such as the expectation maximization algorithm have been employed to estimate these kinds of models, but this approach often suffers from a large bias when the amount of data is limited. To avoid this problem, we introduce a Markov chain Monte Carlo-based model estimation method. In particular, we aim to identify a suitable sampling method for the mixture-of-mixture models. Gibbs sampling is a possible approach, but this can easily lead to the local optimum problem when each component is represented by a multi-modal distribution. Thus, we propose a novel Gibbs sampling method, called nested Gibbs sampling, which represents the lower-level (fine) data structure based on elemental mixture distributions and the higher-level (coarse) data structure based on mixture of-mixture distributions. We applied this method to a speaker clustering problem and conducted experiments under various conditions. The results demonstrated that the proposed method outperformed conventional sampling-based, variational Bayesian, and hierarchical agglomerative methods.

Research Areas:

Abstract: