Residual Modeling for High-Fidelity Learned Compression of Scientific Data

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study addresses the inefficiency of existing learning-based methods in high-fidelity scientific data compression (block-level NRMSE of 10⁻⁶–10⁻⁴), which stems from excessive bitrate consumption by global residual correction. The work introduces, for the first time, a residual-centric modeling framework that treats residuals as an independent signal type and proposes two specialized coding schemes: a training-free deterministic LBRC encoder and an NGLR encoder incorporating causal neural prediction. The approach integrates 3D Lorenzo differencing, Zigzag mapping, bitplane decomposition, and entropy coding, augmented with adaptive quantization and integer-based residual processing. Evaluated on E3SM, JHTDB, and ERA5 datasets, LBRC achieves 30–60% higher compression ratios than GAE, while NGLR further improves this by 10–40%, outperforming the SZ compressor in high-fidelity regimes.

📝 Abstract

Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met. This works at moderate tolerances, but in the high-fidelity regime with block-level NRMSE from 10^-6 to 10^-4, the number of retained coefficients grows quickly and the correction stream dominates the total rate. We propose a residual-centric view: the learned residual is structurally different from the original scientific field and should be coded with a representation designed for that residual. We introduce two residual coders. LBRC is a deterministic, training-free pipeline that adaptively quantizes the learned residual to the target NRMSE and losslessly encodes the resulting integer residual using 3D Lorenzo differencing, zigzag mapping, bit-plane coding, and entropy coding. NGLR adds a causal neural predictor that outputs a normalized bias for an integer-rounded Lorenzo prediction in the same deterministic integer pipeline, reducing the entropy of the remaining residual code while preserving deterministic decoding. The predictor weights are serialized and counted in the bitstream. Across E3SM, JHTDB, and ERA5 at block-level NRMSE targets from 10^-6 to 10^-4, LBRC improves compression ratio over GAE by 30-60% and is broadly competitive with SZ. NGLR adds a further 10-40% over LBRC and outperforms SZ in the evaluated high-fidelity regime. These results show that residual representations tailored to learned-compressor residuals can preserve the advantage of learned compression when global residual correction becomes rate-dominant.

Problem

Research questions and friction points this paper is trying to address.

learned compression

high-fidelity

residual modeling

scientific data compression

block-level accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

residual modeling

learned compression

high-fidelity