🤖 AI Summary
This work addresses the challenge of accurately capturing the impulsive characteristics and underlying physical mechanisms of engine sounds using conventional neural audio synthesis methods. To this end, the authors propose the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that explicitly models the temporal structure and waveform of exhaust pressure pulses by feeding a parameterized, ignition-aligned pulse train into a recursive Karplus-Strong resonator. The model incorporates key physical priors—including harmonic decay, thermodynamic pitch modulation, valve dynamics, exhaust resonance, and throttle operation—to achieve high-fidelity and interpretable synthesis. Evaluated on 7.5 hours of audio from three engine types, PTR demonstrates a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss compared to a harmonic-plus-noise baseline.
📝 Abstract
Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the underlying pulse shapes and temporal structure. We present the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics. The architecture integrates physics-informed inductive biases including harmonic decay, thermodynamic pitch modulation, valve-dynamics envelopes, exhaust system resonances and derived engine operating modes such as throttle operation and deceleration fuel cutoff (DCFO).
Validated on three diverse engine types totaling 7.5 hours of audio, PTR achieves a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss over a harmonic-plus-noise baseline model, while providing interpretable parameters corresponding to physical phenomena.
Complete code, model weights, and audio examples are openly available.