🤖 AI Summary
Existing dialogue AI personality datasets suffer from insufficient cultural diversity and adaptability, hindering the development of culture-aware systems. This paper proposes a two-stage culture-aware personality generation framework and introduces KoPersona—the first large-scale Korean cultural personality dataset (200K samples). It is the first work to systematically integrate Hofstede’s cultural dimensions, Korean societal norms, generational differences, and etiquette features, enabling controllable personality synthesis via cultural knowledge injection, rule-based templating, and LLM fine-tuning. The proposed paradigm is cross-culturally extensible, overcoming the cultural homogeneity bottleneck of generic personality datasets. Experiments demonstrate that KoPersona significantly outperforms baselines in cultural consistency, personality diversity, and dialogue adaptability. When integrated into downstream dialogue models, it improves emotional resonance and cultural appropriateness in Korean-language interactions by 23.6%.
📝 Abstract
Incorporating personas into conversational AI models is crucial for achieving authentic and engaging interactions. However, the cultural diversity and adaptability of existing persona datasets is often overlooked, reducing their efficacy in building culturally aware AI systems. To address this issue, we propose a two-step pipeline for generating culture-specific personas and introduce KoPersona, a dataset comprising 200,000 personas designed to capture Korean cultural values, behaviors, and social nuances. A comprehensive evaluation through various metrics validates the quality of KoPersona and its relevance to Korean culture. This work not only contributes to persona-based research, but also establishes a scalable approach for creating culturally relevant personas adaptable to various languages and cultural contexts.