🤖 AI Summary
Semantic frame induction aims to cluster lexical units that evoke the same semantic frame, yet prior work has not explored the potential of causal language models (CLMs) for this task. This paper introduces FrameEOL, the first approach to leverage CLMs—such as GPT and Llama—for frame induction. FrameEOL employs prompt-based learning to generate semantic frame embeddings, integrating in-context learning with deep metric learning to optimize clustering. Crucially, it requires no parameter fine-tuning and achieves performance comparable to fully fine-tuned models—even in low-resource languages—using only a handful of exemplars (e.g., five in Japanese). Experiments on English and Japanese FrameNet benchmarks demonstrate that FrameEOL substantially outperforms existing methods, particularly under data-scarce conditions. Its strong generalization capability and practical efficiency establish a novel, lightweight paradigm for semantic frame induction.
📝 Abstract
Semantic frame induction is the task of clustering frame-evoking words according to the semantic frames they evoke. In recent years, leveraging embeddings of frame-evoking words that are obtained using masked language models (MLMs) such as BERT has led to high-performance semantic frame induction. Although causal language models (CLMs) such as the GPT and Llama series succeed in a wide range of language comprehension tasks and can engage in dialogue as if they understood frames, they have not yet been applied to semantic frame induction. We propose a new method for semantic frame induction based on CLMs. Specifically, we introduce FrameEOL, a prompt-based method for obtaining Frame Embeddings that outputs One frame-name as a Label representing the given situation. To obtain embeddings more suitable for frame induction, we leverage in-context learning (ICL) and deep metric learning (DML). Frame induction is then performed by clustering the resulting embeddings. Experimental results on the English and Japanese FrameNet datasets demonstrate that the proposed methods outperform existing frame induction methods. In particular, for Japanese, which lacks extensive frame resources, the CLM-based method using only 5 ICL examples achieved comparable performance to the MLM-based method fine-tuned with DML.