Norm-Q: Effective Compression Method for Hidden Markov Models in Neuro-Symbolic Applications

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational and memory overhead of deploying Hidden Markov Models (HMMs) in neuro-symbolic systems, this paper proposes a hardware-friendly low-bit quantization framework. Methodologically, it introduces: (i) a normalized linear quantization scheme tailored to HMM parameter distributions; (ii) a quantization-aware Expectation-Maximization (Q-EM) training algorithm that jointly optimizes discretization error and model likelihood; and (iii) end-to-end lossless or lossy compression support across 3–8 bits. Evaluated on a 4096-state HMM for constrained text generation in large language models, the framework achieves up to 99% weight compression: full lossless recovery at 8 bits, and task-sufficient accuracy even at 3 bits. This significantly reduces on-chip memory footprint and bandwidth requirements, establishing a new paradigm for efficient HMM deployment on resource-constrained edge devices.

Technology Category

Application Category

📝 Abstract
Hidden Markov models (HMM) are commonly used in generation tasks and have demonstrated strong capabilities in neuro-symbolic applications for the Markov property. These applications leverage the strengths of neural networks and symbolic reasoning to create robust and interpretable AI systems. However, they may inherit and amplify the shortcomings of both approaches. Both components require dense computation and data transfer, and their communication further hinders performance. This paper proposes Norm-Q, a normalized linear quantization approach for compressing probabilistic symbolic models, such as HMMs. We reduce the bit width of the data with minimal impact, thereby alleviating memory and bandwidth stress and enabling deployment on potential custom hardware. Our method introduces a normalized quantization-aware expectation maximization process for probabilistic model training. The experimental results show that Norm-Q achieves a higher compression rate with reasonable score loss compared to traditional quantization methods. In the case of the constrained generation task of large language models, we successfully quantize an HMM of 4096 hidden states to 8 bits without loss and, at most, 3 bits with acceptable loss. Notably, the Norm-Q method can achieve a compression rate of 99% for the weights of the HMM. The code is open source at https://github.com/superstarghy/Norm-Q.
Problem

Research questions and friction points this paper is trying to address.

Compresses Hidden Markov Models to reduce computational demands
Addresses memory and bandwidth limitations in neuro-symbolic systems
Enables efficient deployment on custom hardware via quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalized linear quantization for compressing probabilistic symbolic models
Quantization-aware expectation maximization process for model training
Achieves high compression rates with minimal performance loss
🔎 Similar Papers
No similar papers found.
H
Hanyuan Gao
Department of Electrical and Computer Engineering, University of Virginia
Xiaoxuan Yang
Xiaoxuan Yang
University of Virginia
In-Memory ComputingComputer-Aided DesginMachine Learning Acceleration