TALK    Tensor representation of speaker space for arbitrary speaker conversion

Date released: September 6, 2012


  •  TALK    Tensor representation of speaker space for arbitrary speaker conversion
  • Date & Time:

    Thursday, September 6, 2012; 12:00 PM

  • Abstract:

    In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this talk, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.

  • Speaker:

    Dr. Daisuke Saito
    The University of Tokyo

    Daisuke Saito received his B.E., M.S., and Dr. Eng. degrees in 2006, 2008, and 2011, respectively, from the University of Tokyo, Tokyo, Japan. From 2010 to 2011, he was a research fellow (DC2) of the Japan Society for the Promotion of Science. He is currently an Assistant Professor in the Graduate School of Information Science and Technology, the University of Tokyo. He is interested in various areas of speech engineering, including voice conversion, speech synthesis, acoustic analysis, speaker recognition, and speech recognition. He is a member of the Institute of Electrical and Electronics Engineers (IEEE), the International Speech Communication Association (ISCA), the Acoustical Society of Japan (ASJ), the Institute of Electronics, Information and Communication Engineers (IEICE), the Japanese Society for Artificial Intelligence (JSAI), and the Institute of Image Information and Television Engineers (ITE). He received the ISCA Award for the best student paper of INTERSPEECH 2011, and the Awaya Award from the ASJ in 2012.

  • Research Area:

    Speech & Audio