TR2016-138
Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering
-
- "Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering", APSIPA Transactions on Signal and Information Processing, DOI: 10.1017/ATSIP.2016.15, Vol. 5, October 2016.BibTeX TR2016-138 PDF
- @article{Tawara2016oct,
- author = {Tawara, Naohiro and Ogawa, Tetsuji and Watanabe, Shinji and Kobayashi, Tetsunori},
- title = {Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering},
- journal = {APSIPA Transactions on Signal and Information Processing},
- year = 2016,
- volume = 5,
- month = oct,
- doi = {10.1017/ATSIP.2016.15},
- url = {https://www.merl.com/publications/TR2016-138}
- }
,
- "Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering", APSIPA Transactions on Signal and Information Processing, DOI: 10.1017/ATSIP.2016.15, Vol. 5, October 2016.
-
Research Areas:
Abstract:
This paper proposes a novel model estimation method, which uses nested Gibbs sampling to develop a mixtureof-mixture model to represent the distribution of the models components with a mixture model. This model is suitable for analyzing multilevel data comprising frame-wise observations, such as videos and acoustic signals, which are composed of frame-wise observations. Deterministic procedures, such as the expectation maximization algorithm have been employed to estimate these kinds of models, but this approach often suffers from a large bias when the amount of data is limited. To avoid this problem, we introduce a Markov chain Monte Carlo-based model estimation method. In particular, we aim to identify a suitable sampling method for the mixture-of-mixture models. Gibbs sampling is a possible approach, but this can easily lead to the local optimum problem when each component is represented by a multi-modal distribution. Thus, we propose a novel Gibbs sampling method, called nested Gibbs sampling, which represents the lower-level (fine) data structure based on elemental mixture distributions and the higher-level (coarse) data structure based on mixture of-mixture distributions. We applied this method to a speaker clustering problem and conducted experiments under various conditions. The results demonstrated that the proposed method outperformed conventional sampling-based, variational Bayesian, and hierarchical agglomerative methods.