Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low inference efficiency, poor robustness, and high computational overhead of Bayesian uncertainty quantification in deep neural networks on resource-constrained platforms—such as embedded systems and analog hardware—this paper proposes an algorithm–hardware co-optimization framework. First, we introduce Galen, an automated layer-wise compression method integrating sensitivity analysis with hardware-aware feedback. Second, we pioneer the use of controllable analog noise as an intrinsic entropy source to enable fast, low-overhead probabilistic inference on photonic hardware. Third, we unify noise-aware training, analytical approximate Bayesian inference, and hybrid digital/analog deployment strategies. Experiments demonstrate substantial reductions in computational energy consumption while simultaneously improving prediction robustness under non-stationary conditions and enhancing uncertainty quantification accuracy. To our knowledge, this work achieves the first co-optimization of high energy efficiency and strong robustness on photonic hardware.

Technology Category

Application Category

📝 Abstract
While modern machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency, particularly on embedded and resource-limited platforms. In practice, neural networks must not only operate efficiently but also provide reliable predictions under distributional shifts or unseen data. Bayesian neural networks offer a principled framework for quantifying uncertainty, yet their computational overhead further compounds these challenges. This work advances resource-efficient and robust inference for both conventional and Bayesian neural networks through the joint pursuit of algorithmic and hardware efficiency. The former reduces computation through model compression and approximate Bayesian inference, while the latter optimizes deployment on digital accelerators and explores analog hardware, bridging algorithmic design and physical realization. The first contribution, Galen, performs automatic layer-specific compression guided by sensitivity analysis and hardware-in-the-loop feedback. Analog accelerators offer efficiency gains at the cost of noise; this work models device imperfections and extends noisy training to nonstationary conditions, improving robustness and stability. A second line of work advances probabilistic inference, developing analytic and ensemble approximations that replace costly sampling, integrate into a compiler stack, and optimize embedded inference. Finally, probabilistic photonic computing introduces a paradigm where controlled analog noise acts as an intrinsic entropy source, enabling fast, energy-efficient probabilistic inference directly in hardware. Together, these studies demonstrate how efficiency and reliability can be advanced jointly through algorithm-hardware co-design, laying the foundation for the next generation of trustworthy, energy-efficient machine-learning systems.
Problem

Research questions and friction points this paper is trying to address.

Enabling efficient neural network inference on embedded platforms
Providing robust predictions under distributional shifts and uncertainty
Reducing computational overhead of Bayesian neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-specific compression guided by sensitivity analysis
Modeling device imperfections for robust analog training
Probabilistic photonic computing using analog noise entropy
🔎 Similar Papers
No similar papers found.