- Date: July 15, 2015
Research Area: Speech & Audio
Brief - A new book on Bayesian Speech and Language Processing has been published by MERL researcher, Shinji Watanabe, and research collaborator, Jen-Tzung Chien, a professor at National Chiao Tung University in Taiwan.
With this comprehensive guide you will learn how to apply Bayesian machine learning techniques systematically to solve various problems in speech and language processing. A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n-gram models and latent topic models, along with applications including automatic speech recognition, speaker verification, and information retrieval. Approximate Bayesian inferences based on MAP, Evidence, Asymptotic, VB, and MCMC approximations are provided as well as full derivations of calculations, useful notations, formulas, and rules. The authors address the difficulties of straightforward applications and provide detailed examples and case studies to demonstrate how you can successfully use practical Bayesian inference methods to improve the performance of information systems. This is an invaluable resource for students, researchers, and industry practitioners working in machine learning, signal processing, and speech and language processing.
-
- Date: April 20, 2015
Brief - Mitsubishi Electric researcher, Yuuki Tachioka of Japan, and MERL researcher, Shinji Watanabe, presented a paper at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP) entitled, "A Discriminative Method for Recurrent Neural Network Language Models". This paper describes a discriminative (language modelling) method for Japanese speech recognition. The Japanese Nikkei newspapers and some other press outlets reported on this method and its performance for Japanese speech recognition tasks.
-
- Date: March 9, 2015
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Recent research on speech enhancement by MERL's Speech and Audio team was highlighted in "Cars That Think", IEEE Spectrum's blog on smart technologies for cars. IEEE Spectrum is the flagship publication of the Institute of Electrical and Electronics Engineers (IEEE), the world's largest association of technical professionals with more than 400,000 members.
-
- Date: February 17, 2015
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Mitsubishi Electric Corporation announced that it has developed breakthrough noise-suppression technology that significantly improves the quality of hands-free voice communication in noisy conditions, such as making a voice call via a car navigation system. Speech clarity is improved by removing 96% of surrounding sounds, including rapidly changing noise from turn signals or wipers, which are difficult to suppress using conventional methods. The technology is based on recent research on speech enhancement by MERL's Speech and Audio team. .
-
- Date: Thursday, October 23, 2014
Location: Mitsubishi Electric Research Laboratories (MERL)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - SANE 2014, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 23, 2014 at MIT, in Cambridge, MA. It is a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), and SANE 2013, held at Columbia University, which each gathered around 70 researchers and students. SANE 2014 will feature invited talks by leading researchers from the Northeast as well as Europe: Najim Dehak (MIT), Hakan Erdogan (MERL/Sabanci University), Gael Richard (Telecom ParisTech), George Saon (IBM Research), Andrew Senior (Google Research), Stavros Tsakalidis (BBN - Raytheon), and David Wingate (Lyric). It will also feature a lively poster session during lunch time, open to both students and researchers. SANE 2014 is organized by Jonathan Le Roux (MERL), Jim Glass (MIT), and John R. Hershey (MERL).
-
- Date: May 10, 2014
Where: REVERB Workshop
Research Area: Speech & Audio
Brief - Mitsubishi Electric's submission to the REVERB workshop achieved the second best performance among all participating institutes. The team included Yuuki Tachioka and Tomohiro Narita of MELCO in Japan, and Shinji Watanabe and Felix Weninger of MERL. The challenge addresses automatic speech recognition systems that are robust against varying room acoustics.
-
- Date: May 12, 2014 - May 14, 2014
Where: Hands-free Speech Communication and Microphone Arrays (HSCMA)
Research Area: Speech & Audio
Brief - MERL is a sponsor for the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), held in Nancy, France, in May 2014.
-
- Date: May 1, 2014
Where: IEEE Global Conference on Signal and Information Processing (GlobalSIP)
Research Area: Speech & Audio
Brief - John R. Hershey is Co-Chair of the GlobalSIP 2014 Symposium on Machine Learning.
-
- Date: March 11, 2014
Awarded to: Yuuki Tachioka
Awarded for: "Effectiveness of discriminative approaches for speech recognition under noisy environments on the 2nd CHiME Challenge"
Awarded by: Acoustical Society of Japan (ASJ)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MELCO researcher Yuuki Tachioka received the Awaya Prize Young Researcher Award from the Acoustical Society of Japan (ASJ) for "effectiveness of discriminative approaches for speech recognition under noisy environments on the 2nd CHiME Challenge", which was based on joint work with MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John R. Hershey.
-
- Date: March 1, 2014
Where: IEEE Signal Processing Society
Research Area: Speech & Audio
Brief - John R. Hershey is Guest Editor for the Special Issue on Signal Processing Techniques for Assisted Listening of the IEEE Signal Processing.
-
- Date: January 1, 2014
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - Jonathan Le Roux, Shinji Watanabe and John R. Hershey have been elected for 3-year terms to Technical Committees of the IEEE Signal Processing Society. Jonathan has been elected to the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP-TC), and Shinji and John to the Speech and Language Processing Technical Committee (SL-TC). Members of the Speech & Audio team now together hold four TC positions, as John also serves on the AASP-TC.
-
- Date & Time: Thursday, October 24, 2013; 8:45 AM - 5:00 PM
Location: Columbia University
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - SANE 2013, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 24, 2013 at Columbia University, in New York City.
A follow-up to SANE 2012 held in October 2012 at MERL in Cambridge, MA, this year's SANE will be held in conjunction with the WASPAA workshop, held October 20-23 in upstate New York. WASPAA attendees are welcome and encouraged to attend SANE.
SANE 2013 will feature invited speakers from the Northeast, as well as from the international community. It will also feature a lively poster session during lunch time, open to both students and researchers.
SANE 2013 is organized by Prof. Dan Ellis (Columbia University), Jonathan Le Roux (MERL) and John R. Hershey (MERL).
-
- Date & Time: Thursday, October 17, 2013; 12:00 PM
Speaker: Prof. Laurent Daudet, Paris Diderot University, France
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract - In acoustics, one may wish to acquire a wavefield over a whole spatial domain, while we can only make point measurements (ie, with microphones). Even with few sources, this remains a difficult problem because of reverberation, which can be hard to characterize. This can be seen as a sampling / interpolation problem, and it raises a number of interesting questions: how many sample points are needed, where to choose the sampling points, etc. In this presentation, we will review some case studies, in 2D (vibrating plates) and 3D (room acoustics), with numerical and experimental data, where we have developed sparse models, possibly with additional 'structures', based on a physical modeling of the acoustic field. These type of models are well suited to reconstruction techniques known as compressed sensing. These principles can also be used for sub-nyquist optical imaging : we will show preliminary experimental results of a new compressive imager, remarkably simple in its principle, using a multiply scattering medium.
-
- Date: September 26, 2013
Awarded to: Jonathan Le Roux
Awarded for: "A new non-negative dynamical system for speech and audio modeling"
Awarded by: Acoustical Society of Japan (ASJ)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
-
- Date: June 1, 2013
Awarded to: Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux and John R. Hershey
Awarded for: "Discriminative Methods for Noise Robust Speech Recognition: A CHiME Challenge Benchmark"
Awarded by: International Workshop on Machine Listening in Multisource Environments (CHiME)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The results of the 2nd 'CHiME' Speech Separation and Recognition Challenge are out! The team formed by MELCO researcher Yuuki Tachioka and MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John Hershey obtained the best results in the continuous speech recognition task (Track 2). This very challenging task consisted in recognizing speech corrupted by highly non-stationary noises recorded in a real living room. Our proposal, which also included a simple yet extremely efficient denoising front-end, focused on investigating and developing state-of-the-art automatic speech recognition back-end techniques: feature transformation methods, as well as discriminative training methods for acoustic and language modeling. Our system significantly outperformed other participants. Our code has since been released as an improved baseline for the community to use.
-
- Date: June 1, 2013
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The results of the 2nd CHiME Speech Separation and Recognition Challenge are out! The team formed by MELCO researcher Yuuki Tachioka and MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John Hershey obtained the best results in the continuous speech recognition task (Track 2). This very challenging task consisted in recognizing speech corrupted by highly non-stationary noises recorded in a real living room. Our proposal, which also included a simple yet extremely efficient denoising front-end, focused on investigating and developing state-of-the-art automatic speech recognition back-end techniques: feature transformation methods, as well as discriminative training methods for acoustic and language modeling. Our system significantly outperformed other participants. Our code has since been released as an improved baseline for the community to use.
-
- Date: June 1, 2013
Where: International Workshop on Machine Listening in Multisource Environments (CHiME)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The paper "Discriminative Methods for Noise Robust Speech Recognition: A CHiME Challenge Benchmark" by Tachioka, Y., Watanabe, S., Le Roux, J. and Hershey, J.R. was presented at the International Workshop on Machine Listening in Multisource Environments (CHiME).
-
- Date & Time: Saturday, June 1, 2013; 9:00 AM - 6:00 PM
Location: Vancouver, Canada
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL researchers Shinji Watanabe and Jonathan Le Roux are members of the organizing committee of CHiME 2013, the 2nd International Workshop on Machine Listening in Multisource Environments, Jonathan acting as Program Co-Chair. MERL is also a sponsor for the event.
CHiME 2013 is a one-day workshop to be held in conjunction with ICASSP 2013 that will consider the challenge of developing machine listening applications for operation in multisource environments, i.e. real-world conditions with acoustic clutter, where the number and nature of the sound sources is unknown and changing over time. CHiME brings together researchers from a broad range of disciplines (computational hearing, blind source separation, speech recognition, machine learning) to discuss novel and established approaches to this problem. The cross-fertilisation of ideas will foster fresh approaches that efficiently combine the complementary strengths of each research field.
-
- Date & Time: Thursday, May 30, 2013; 12:30 PM - 2:30 PM
Location: Vancouver, Canada
MERL Contacts: Anthony Vetro; Petros T. Boufounos; Jonathan Le Roux
Research Area: Speech & Audio
Brief - MERL is a sponsor for the first ICASSP Student Career Luncheon that will take place at ICASSP 2013. MERL members will take part in the event to introduce MERL and talk with students interested in positions or internships.
-
- Date & Time: Tuesday, May 7, 2013; 2:30 PM
Speaker: Dr. Yotaro Kubo, NTT Communication Science Laboratories, Kyoto, Japan
Research Area: Speech & Audio
Abstract - Kernel methods are important to realize both convexity in estimation and ability to represent nonlinear classification. However, in automatic speech recognition fields, kernel methods are not widely used conventionally. In this presentation, I will introduce several attempts to practically incorporate kernel methods into acoustic models for automatic speech recognition. The presentation will consist of two parts. The first part will describes maximum entropy discrimination and its application to a kernel machine training. The second part will describes dimensionality reduction of kernel-based features.
-
- Date: May 2, 2013
Where: International Conference on Learning Representations (ICLR)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The paper "Block Coordinate Descent for Sparse NMF" by Potluru, V.K., Plis, S.M., Le Roux, J., Pearlmutter, B.A., Calhoun, V.D. and Hayes, T.P. was presented at the International Conference on Learning Representations (ICLR).
-
- Date: March 1, 2013
Where: IEEE Signal Processing Letters
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief - The article "Consistent Wiener Filtering for Audio Source Separation" by Le Roux, J. and Vincent, E. was published in IEEE Signal Processing Letters.
-
- Date & Time: Tuesday, February 26, 2013; 12:00 PM
Speaker: Prof. Taylan Cemgil, Bogazici University, Istanbul, Turkey
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract - Algorithms for decompositions of matrices are of central importance in machine learning, signal processing and information retrieval, with SVD and NMF (Nonnegative Matrix Factorisation) being the most widely used examples. Probabilistic interpretations of matrix factorisation models are also well known and are useful in many applications (Salakhutdinov and Mnih 2008; Cemgil 2009; Fevotte et. al. 2009). In the recent years, decompositions of multiway arrays, known as tensor factorisations have gained significant popularity for the analysis of large data sets with more than two entities (Kolda and Bader, 2009; Cichocki et. al. 2008). We will discuss a subset of these models from a statistical modelling perspective, building upon probabilistic Bayesian generative models and generalised linear models (McCulloch and Nelder). In both views, the factorisation is implicit in a well-defined hierarchical statistical model and factorisations can be computed via maximum likelihood.
We express a tensor factorisation model using a factor graph and the factor tensors are optimised iteratively. In each iteration, the update equation can be implemented by a message passing algorithm, reminiscent to variable elimination in a discrete graphical model. This setting provides a structured and efficient approach that enables very easy development of application specific custom models, as well as algorithms for the so called coupled (collective) factorisations where an arbitrary set of tensors are factorised simultaneously with shared factors. Extensions to full Bayesian inference for model selection, via variational approximations or MCMC are also feasible. Well known models of multiway analysis such as Nonnegative Matrix Factorisation (NMF), Parafac, Tucker, and audio processing (Convolutive NMF, NMF2D, SF-SSNTF) appear as special cases and new extensions can easily be developed. We will illustrate the approach with applications in link prediction and audio and music processing.
-
- Date & Time: Monday, January 28, 2013; 11:00 AM
Speaker: Prof. Jen-Tzung Chien, National Chiao Tung University, Taiwan
Research Area: Speech & Audio
Abstract - Bayesian learning provides attractive tools to model, analyze, search, recognize and understand real-world data. In this talk, I will introduce a new Bayesian group sparse learning and its application on speech recognition and signal separation. First of all, I present the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The features across states and within states are represented accordingly. The sparse prior is imposed by introducing the Laplacian scale mixture (LSM) distribution. The robustness of speech recognition is illustrated. On the other hand, the LSM distribution is also incorporated into Bayesian group sparse learning based on the nonnegative matrix factorization (NMF). This approach is developed to estimate the reconstructed rhythmic and harmonic music signals from single-channel source signal. The Monte Carlo procedure is presented to infer two groups of parameters. The future work of Bayesian learning shall be discussed.
-
- Date & Time: Tuesday, December 11, 2012; 12:00 PM
Speaker: Takahiro Oku, NHK Science & Technology Research Laboratories
Research Area: Speech & Audio
Abstract - In this talk, I will present human-friendly broadcasting research conducted in NHK and research on speech recognition for real-time closed-captioning. The goal of human-friendly broadcasting research is to make broadcasting more accessible and enjoyable for everyone, including children, elderly, and physically challenged persons. The automatic speech recognition technology that NHK has developed makes it possible to create captions for the hearing impaired in real-time automatically. For sports programs such as professional sumo wrestling, a closed-captioning system has already been implemented in which captions are created by using speech recognition on a captioning re-speaker. In 2011, NHK General Television started broadcasting of closed captions for the information program "Morning Market". After the introduction of the implemented closed-captioning system, I will talk about our recent improvement obtained by an adaptation method that creates a more effective acoustic model using error correction results. The method reflects recognition error tendencies more effectively.
-