TR2024-124

PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation


Abstract:

While offline speech separation models have made significant advances, the streaming regime remains less explored and is typically limited to causal modifications of existing offline net- works. This study focuses on empowering a streaming speech separation model with autoregressive capability, in which the current step separation is conditioned on separated samples from past steps. To do so, we introduce pseudo-autoregressive Siamese (PARIS) training: with only two forward passes through a Siamese-style network for each batch, PARIS avoids the training-inference mismatch in teacher forcing and the need for numerous autoregressive steps during training. The pro- posed PARIS training improves the recent online SkiM model by 1.5 dB in SI-SNR on the WSJ0-2mix dataset, with minimal change to the network architecture and inference time.