The Belief State Transformer

📅 2024-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low inference efficiency and inability of conventional Transformers to implicitly model task-specific belief states in target-conditioned text generation (e.g., fill-in-the-middle with given prefixes and suffixes). We propose the Bidirectional Perceiver (Bi-Perceiver), which jointly encodes prefixes and suffixes and simultaneously predicts the next token in the prefix and the preceding token in the suffix. By encoding an implicit belief state, Bi-Perceiver enables zero-shot, target-conditioned decoding without fine-tuning. Our approach introduces two key innovations: (i) Fill-in-the-Middle (FIM) contrastive training and (ii) a bidirectional token prediction objective—both overcoming the limitations of unidirectional modeling. Experiments demonstrate that Bi-Perceiver significantly outperforms baseline FIM methods on story writing; achieves consistent improvements in generation quality and inference efficiency under both known and unknown target conditions; and attains strong robustness and high-fidelity text representations even with compact model sizes.

Technology Category

Application Category

📝 Abstract
We introduce the"Belief State Transformer", a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previous token for the suffix. The Belief State Transformer effectively learns to solve challenging problems that conventional forward-only transformers struggle with, in a domain-independent fashion. Key to this success is learning a compact belief state that captures all relevant information necessary for accurate predictions. Empirical ablations show that each component of the model is essential in difficult scenarios where standard Transformers fall short. For the task of story writing with known prefixes and suffixes, our approach outperforms the Fill-in-the-Middle method for reaching known goals and demonstrates improved performance even when the goals are unknown. Altogether, the Belief State Transformer enables more efficient goal-conditioned decoding, better test-time inference, and high-quality text representations on small scale problems. Website: https://sites.google.com/view/belief-state-transformer
Problem

Research questions and friction points this paper is trying to address.

Predicts next and previous tokens simultaneously
Solves challenges beyond conventional transformers
Improves goal-conditioned decoding efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicts next and previous tokens
Learns compact belief state
Outperforms Fill-in-the-Middle method
🔎 Similar Papers
No similar papers found.