🤖 AI Summary
This work addresses the challenge of video multicast under ultra-low data rates (2.3 kbps) in underwater acoustic channels, where conventional communication systems suffer from high bit error rates (20–46%) that render forward error correction schemes like LDPC ineffective or even detrimental. The paper proposes the first end-to-end semantic-aware physical-layer design, embedding video semantics into a trainable waveform codebook and integrating a fully differentiable OFDM framework with VideoGPT tokenization. This approach prioritizes the recovery of semantically similar content when decoding errors are unavoidable, thereby overcoming the fundamental limitation of traditional modulation schemes that cannot transmit video at such low rates. Evaluated on the NOF1 channel, the method achieves a 5 dB PSNR gain (+19.26%) and a 0.10 SSIM improvement (+14.28%) over the strongest FEC baseline, enabling real-time transmission of 128×128@16 FPS video, with performance gains further amplified under harsher channel conditions.
📝 Abstract
We present E2E-WAVE, the first end-to-end learned waveform generation system for underwater video multicasting. Acoustic channels exhibit 20--46% bit error rates where forward error correction becomes counterproductive -- LDPC increases rather than decreases errors beyond its decoding threshold. E2E-WAVE addresses this by embedding semantic similarity directly into physical layer waveforms: when decoding errors are unavoidable, the system preferentially selects semantically similar tokens rather than arbitrary corruption. Combining VideoGPT tokenization (1024x compression) with a trainable waveform bank and fully differentiable OFDM transmission, E2E-WAVE achieves +5 dB (19.26%) PSNR and +0.10 (14.28%) SSIM over the strongest FEC-protected baseline in less challenging underwater channel (NOF1) while delivering real-time 16 FPS video at 128x128 resolution over 2.3 kbps channels -- impossible for conventional digital modulation. The performance gap only increases in harsher channels (BCH1, NCS1). Trained on a single channel, E2E-WAVE generalizes to unseen underwater environments without retraining, while HEVC fails at sub-5 kbps rates and SoftCast's AWGN assumptions collapse on frequency-selective channels.