LCTG Bench: LLM Controlled Text Generation Benchmark

📅 2025-01-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing evaluation frameworks lack Japanese-specific benchmarks for assessing controllability of large language models (LLMs), hindering systematic comparison of instruction-following capabilities between multilingual and Japanese-specialized models. Method: We introduce JControlBench—the first Japanese-controllable text generation benchmark—featuring a linguistically adapted instruction control task suite and a unified, cross-task controllability evaluation framework. It covers nine state-of-the-art Japanese and multilingual LLMs, including GPT-4, and employs standardized, reproducible quantitative metrics. Contribution/Results: Our experiments provide the first systematic evidence that Japanese-specialized models significantly outperform multilingual counterparts in fine-grained instruction execution and cultural-context adaptation. JControlBench fills a critical gap in controllability evaluation for low-resource languages and establishes a rigorous, reliable foundation for Japanese LLM development, selection, and deployment.

Technology Category

Application Category

📝 Abstract

The rise of large language models (LLMs) has led to more diverse and higher-quality machine-generated text. However, their high expressive power makes it difficult to control outputs based on specific business instructions. In response, benchmarks focusing on the controllability of LLMs have been developed, but several issues remain: (1) They primarily cover major languages like English and Chinese, neglecting low-resource languages like Japanese; (2) Current benchmarks employ task-specific evaluation metrics, lacking a unified framework for selecting models based on controllability across different use cases. To address these challenges, this research introduces LCTG Bench, the first Japanese benchmark for evaluating the controllability of LLMs. LCTG Bench provides a unified framework for assessing control performance, enabling users to select the most suitable model for their use cases based on controllability. By evaluating nine diverse Japanese-specific and multilingual LLMs like GPT-4, we highlight the current state and challenges of controllability in Japanese LLMs and reveal the significant gap between multilingual models and Japanese-specific models.

Problem

Research questions and friction points this paper is trying to address.

Controllability

Japanese Language

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

LCTG Bench

Multilingual Model Evaluation

Japanese Language Modeling

🔎 Similar Papers

No similar papers found.

Authors to Follow