Quantifying the Energy Floor: Direct Measurement and Replay Buffer Bias in SAC-Based HVAC Control on sbsim

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This study quantifies the fundamental energy lower bound of the Soft Actor-Critic (SAC) algorithm under action-space constraints in building HVAC control and diagnoses the sources of suboptimality in learned policies. Leveraging a calibrated sbsim building simulation platform, the energy lower bound is directly measured for the first time via a minimal-action experiment, yielding a cost of \$35.51 per day. Ablation studies reveal that training SAC without a pre-filled replay buffer reduces operational cost to \$35.57 per day, closing 96% of the performance gap and identifying buffer pre-filling as the primary bottleneck. Furthermore, the coupling between the discount factor and planning horizon is shown to substantially truncate the effective prediction window, while expanding the chilled water supply temperature range yields negligible additional energy savings.
📝 Abstract
We quantify the energy floor -- the minimum achievable cost given action space constraints -- for Soft Actor-Critic (SAC) HVAC control on the sbsim calibrated building simulator. Through minimum-action experiments, we directly measure this floor at USD 35.51/day, dominated by continuous electrical loads (USD 35.44, 99.8%) with negligible gas consumption. The standard SAC baseline, initialized with schedule-policy replay buffer transitions, converges to USD 37.18/day, 4.7% above the floor. We identify buffer initialization as the dominant source of sub-optimality in this scenario: training from an empty buffer reduces cost to USD 35.57/day, eliminating 96% of the gap. Expanding the supply water temperature range by 10 K yields negligible additional savings (USD 0.03/day), and further expansion triggers physical constraint violations. We additionally uncover a discount factor coupling (gamma_eff = 0.891) shrinking the effective planning horizon from 8.3 h to 46 min -- a benchmark-wide issue warranting audit. Systematic ablation across planning horizon, reward weights, and observation enrichment confirms all pre-filled-buffer configurations cluster within 0.7% (USD 37.18--USD 37.42), demonstrating that equipment minimum power -- not algorithmic design -- imposes the binding constraint.
Problem

Research questions and friction points this paper is trying to address.

energy floor
SAC
HVAC control
replay buffer bias
minimum achievable cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

energy floor
replay buffer bias
SAC
HVAC control
planning horizon