DWTSumm: Discrete Wavelet Transform for Document Summarization

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the challenges of summarizing lengthy domain-specific documents—such as clinical and legal texts—where large language models often suffer from context length limitations, information loss, and hallucination. The study introduces discrete wavelet transform (DWT) into text summarization for the first time, applying multi-resolution analysis to sentence or word embeddings to capture both global structure and local salient semantics. This yields compact representations that either directly form summaries or guide large language models (e.g., GPT-4o) in generation. By enabling semantic denoising while preserving domain-specific content, the method significantly enhances factual consistency and semantic fidelity. Experiments on clinical and legal benchmarks demonstrate improvements of over 2% in BERTScore, more than 4% in semantic fidelity, and notable gains in METEOR, achieving up to 97% fidelity and effectively mitigating hallucinations while strengthening factual grounding.

Technology Category

Application Category

📝 Abstract

Summarizing long, domain-specific documents with large language models (LLMs) remains challenging due to context limitations, information loss, and hallucinations, particularly in clinical and legal settings. We propose a Discrete Wavelet Transform (DWT)-based multi-resolution framework that treats text as a semantic signal and decomposes it into global (approximation) and local (detail) components. Applied to sentence- or word-level embeddings, DWT yields compact representations that preserve overall structure and critical domain-specific details, which are used directly as summaries or to guide LLM generation. Experiments on clinical and legal benchmarks demonstrate comparable ROUGE-L scores. Compared to a GPT-4o baseline, the DWT based summarization consistently improve semantic similarity and grounding, achieving gains of over 2% in BERTScore, more than 4\% in Semantic Fidelity, factual consistency in legal tasks, and large METEOR improvements indicative of preserved domain-specific semantics. Across multiple embedding models, Fidelity reaches up to 97%, suggesting that DWT acts as a semantic denoising mechanism that reduces hallucinations and strengthens factual grounding. Overall, DWT provides a lightweight, generalizable method for reliable long-document and domain-specific summarization with LLMs.

Problem

Research questions and friction points this paper is trying to address.

Document Summarization

Large Language Models

Hallucination

Domain-specific Text

Context Limitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete Wavelet Transform

Document Summarization

Semantic Fidelity