LiveBand: Live Accompaniment Generation in the Audio Domain

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This work proposes the first strictly causal, end-to-end system for real-time music accompaniment generation, achieving high-fidelity streaming synthesis without relying on future audio context. The approach builds a continuous latent space using a causal audio autoencoder and integrates a causal Transformer with sequence-level adversarial training. To ensure perfect alignment between training and inference, the method employs rolling attention and causal masking, thereby entirely eliminating exposure bias caused by teacher forcing. Experimental results demonstrate that the system significantly outperforms existing methods on multi-instrument accompaniment benchmarks, achieving consistent improvements in audio quality, rhythmic alignment, and mix coherence. Moreover, it enables truly zero-lookahead real-time performance on consumer-grade hardware.
📝 Abstract
We present LiveBand, a real-time system that generates high-fidelity music accompaniments to live audio input, respecting strict causal constraints. Our method trains a causal transformer generator in the continuous latent space of a pre-trained causal audio autoencoder, using adversarial sequence-level supervision from a discriminator. At each timestep, the generator receives only the causally available mix context and Gaussian noise, and predicts accompaniment latents without access to future mix frames or ground-truth target latents. Training is performed in a single parallel forward pass under causal masking, while streaming inference proceeds autoregressively with a rolling attention state. The model's training and inference computations are matched by design, eliminating teacher forcing and the associated exposure bias. On a multi-instrument music accompaniment benchmark, LiveBand improves over prior work on objective measures of audio quality, beat alignment, and mix adherence, while enabling real-time streaming generation without lookahead into the future on consumer hardware.
Problem

Research questions and friction points this paper is trying to address.

live accompaniment generation
causal audio generation
real-time music generation
audio domain modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

causal transformer
audio-domain accompaniment
adversarial sequence-level supervision
exposure bias elimination
real-time streaming generation