ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing subband encoders suffer from fixed input-length constraints and lack of frequency-position awareness, limiting their ability to model variable-length industrial signals (e.g., acoustic or vibration data). To address this, we propose the first time-frequency foundation model supporting arbitrary-length inputs: (1) a band-splitting architecture enables adaptive subband decomposition; (2) relative frequency position encoding explicitly captures spectral structure; and (3) a hierarchical band encoding and feature aggregation mechanism eliminates the need for segmentation or padding. The model is pre-trained at scale via contrastive learning on the SIREN benchmark. It achieves state-of-the-art performance on industrial anomaly detection and fault classification tasks, with significant improvements in cross-sampling-rate generalization and cross-domain robustness.

Technology Category

Application Category

📝 Abstract
Pre-trained foundation models have demonstrated remarkable success in vision and language, yet their potential for general machine signal modeling-covering acoustic, vibration, and other industrial sensor data-remains under-explored. Existing approach using sub-band-based encoders has achieved competitive results but are limited by fixed input lengths, and the absence of explicit frequency positional encoding. In this work, we propose a novel foundation model that integrates an advanced band-split architecture with relative frequency positional embeddings, enabling precise spectral localization across arbitrary sampling configurations. The model supports inputs of arbitrary length without padding or segmentation, producing a concise embedding that retains both temporal and spectral fidelity. We evaluate our method on SIREN (https://github.com/yucongzh/SIREN), a newly introduced large-scale benchmark for machine signal encoding that unifies multiple datasets, including all DCASE task 2 challenges (2020-2025) and widely-used industrial signal corpora. Experimental results demonstrate consistent state-of-the-art performance in anomaly detection and fault identification, confirming the effectiveness and generalization capability of the proposed model. We open-sourced ECHO on https://github.com/yucongzh/ECHO.
Problem

Research questions and friction points this paper is trying to address.

Enabling variable-length signal processing without padding or segmentation
Addressing absence of explicit frequency positional encoding in signals
Improving anomaly detection and fault identification in industrial signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Band-split architecture with relative frequency embeddings
Arbitrary input lengths without padding or segmentation
Concise embedding retaining temporal and spectral fidelity
🔎 Similar Papers
No similar papers found.
Yucong Zhang
Yucong Zhang
Ph.D. Student in CS, Wuhan University
Juan Liu
Juan Liu
Wuhan University
Data MiningArtificial Intelligence in BioinformaticsBiomedicine
M
Ming Li
School of Computer Science, Wuhan University, Wuhan, China; Suzhou Municipal Key Laboratory of Multimodal Intelligent Systems, Digital Innovation Research Center, Duke Kunshan University, Kunshan, China