๐ค AI Summary
In dynamic cloud-edge environments, conventional network monitoring approaches struggle to simultaneously achieve low latency, scalability, and semantic understanding. This work proposes a Latent-variable Predictive State Estimator (LPSE) that leverages topology-adaptive temporal encoding to map telemetry data from variable-scale nodes into permutation-invariant slot-based representations, enabling fixed-overhead, single-pass inference through a semantic codebook. LPSE is the first method capable of generalizing to node additions, removals, and reordering without requiring retraining, substantially enhancing dynamic adaptability. Experimental results on multi-node Kubernetes clusters demonstrate that LPSE attains a semantic prediction accuracy of 82.42%, while reducing average inference latency by 41ร and memory footprint by 15ร compared to deployable 4B-parameter large language model endpoints.
๐ Abstract
Closed-loop network monitoring and orchestration increasingly require semantic interpretations of live telemetry beyond raw counter collection. However, dynamic cloud-edge environments change both the active node set and the monitoring query at runtime, while control loops demand bounded millisecond-scale responses. We introduce a latent predictive state estimator (LPSE) for dynamic network monitoring and orchestration, built on latent predictive learning over streaming telemetry. The framework converts variable-cardinality node telemetry into topology-adaptive temporal representations, fuses them with monitoring questions, and returns bounded answers from a semantic codebook instead of autoregressive text generation. This design enables fixed-cost, single-pass inference while preserving semantic interpretability. By operating on permutation-invariant, slot-routed node representations keyed by stable identity, the model maintains a fixed input space and generalizes to node addition, removal, and reordering without retraining. Experimental results on a multi-node Kubernetes cluster show semantic prediction accuracy of 82.42% at approximately 41$\times$ lower mean inference latency and 15$\times$ smaller memory footprint compared with a deployable 4B LLM endpoint.