EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing character customization methods suffer from limited generalizability due to reliance on unimodal input (text-only or image-only) and engine-specific constraints. This work proposes EasyCraft, an end-to-end feedforward framework featuring a novel dual-modality translatable architecture that jointly supports both textual and visual inputs. By leveraging self-supervised feature learning, EasyCraft constructs a cross-domain consistent latent face space, effectively disentangling stylistic representations from engine-specific parameter mappings. The method enables zero-shot adaptation across heterogeneous game engines (e.g., Unity and Unreal) without retraining. Evaluated on two RPG games, EasyCraft achieves state-of-the-art performance in translating arbitrary-style facial images into high-fidelity, robust crafting parameters. It significantly enhances the universality and practicality of personalized avatar generation, offering broad compatibility with diverse input modalities and rendering backends.

Technology Category

Application Category

📝 Abstract
Character customization, or 'face crafting,' is a vital feature in role-playing games (RPGs), enhancing player engagement by enabling the creation of personalized avatars. Existing automated methods often struggle with generalizability across diverse game engines due to their reliance on the intermediate constraints of specific image domain and typically support only one type of input, either text or image. To overcome these challenges, we introduce EasyCraft, an innovative end-to-end feedforward framework that automates character crafting by uniquely supporting both text and image inputs. Our approach employs a translator capable of converting facial images of any style into crafting parameters. We first establish a unified feature distribution in the translator's image encoder through self-supervised learning on a large-scale dataset, enabling photos of any style to be embedded into a unified feature representation. Subsequently, we map this unified feature distribution to crafting parameters specific to a game engine, a process that can be easily adapted to most game engines and thus enhances EasyCraft's generalizability. By integrating text-to-image techniques with our translator, EasyCraft also facilitates precise, text-based character crafting. EasyCraft's ability to integrate diverse inputs significantly enhances the versatility and accuracy of avatar creation. Extensive experiments on two RPG games demonstrate the effectiveness of our method, achieving state-of-the-art results and facilitating adaptability across various avatar engines.
Problem

Research questions and friction points this paper is trying to address.

Automates avatar creation with text and image inputs.
Enhances generalizability across diverse game engines.
Improves versatility and accuracy in character customization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supports both text and image inputs
Unified feature distribution via self-supervised learning
Adaptable to most game engines
🔎 Similar Papers
No similar papers found.
Suzhen Wang
Suzhen Wang
Netease Fuxi AI Lab
Talking face generationfacial animationvirtual humanmultimodal learning&generation
W
Weijie Chen
Netease Fuxi AI Lab
W
Wei Zhang
Netease Fuxi AI Lab
M
Minda Zhao
Netease Fuxi AI Lab
Lincheng Li
Lincheng Li
Netease Fuxi AI Lab
computer vision3D visionvideo synthesismulti-view stereo
Rongsheng Zhang
Rongsheng Zhang
Fuxi AI Lab, NetEase Inc., Hangzhou, China
NLP
Z
Zhipeng Hu
Netease Fuxi AI Lab
X
Xin Yu
The University of Queensland