TALK    Generative Model-Based Text-to-Speech Synthesis

Date released: February 1, 2017


  •  TALK    Generative Model-Based Text-to-Speech Synthesis
  • Date & Time:

    Wednesday, February 1, 2017; 12:00-13:00

  • Abstract:

    Recent progress in generative modeling has improved the naturalness of synthesized speech significantly. In this talk I will summarize these generative model-based approaches for speech synthesis such as WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems.
    See https://deepmind.com/blog/wavenet-generative-model-raw-audio/ for further details.

  • Speaker:

    Dr. Heiga ZEN
    Google

    Dr. Heiga ZEN (also known as Byung Ha CHUN) received the A.E. degree from the Suzuka National College of Technology, Suzuka, Japan, in 1999, and the B.E., M.E., and Ph.D. degrees from the Nagoya Institute of Technology, Nagoya, Japan, in 2001, 2003, and 2006, respectively. From June 2004 to May 2005, he was an intern/co-op researcher at the IBM T. J. Watson Research Center, Yorktown Heights, NY, U.S.A. From April 2006 to June 2008, he was a Research Associate at the Nagoya Institute of Technology. From July 2008 to July 2011, he was a Research Engineer at the Toshiba Research Europe Cambridge Research Laboratory, Cambridge, U.K. Presently, he is a Research Scientist at Google, London, U.K. His research interests include statistical speech recognition and synthesis.

    Dr. Zen was awarded a 2006 ASJ Awaya Award, a 2008 ASJ Itakura Award, a 2008 TAF TELECOM System Technology Award, a 2008 IEICE Information and Systems Society Best Paper Award, and a 2009 IPSJ Yamashita SIG Research Award. He is a member of the ASJ, IEEE, IPSJ, and ISCA, and has been a member of the IEEE Speech and Language Processing Technical Committee (SLTC) since 2012

  • MERL Host:

    Chiori Hori

  • External Link:

    https://www.linkedin.com/in/heiga-zen-b1a64b3

  • Research Area:

    Speech & Audio