Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of object-level motion control in training-free image-to-video generation. The authors propose a fine-tuning-free framework that enables precise object motion editing using only motion priors derived from the first and last frames along with the target frame, eliminating the need for complex inputs such as trajectories, masks, or motion fields. Central to their approach is the novel Attention Consensus Early Seed selection strategy (ACE-Seed), integrated with semantic-guided object insertion, background inpainting, and lightweight attention consensus search to ensure both scene coherence and accurate motion control. The study further introduces the first object motion evaluation benchmark and metrics that explicitly exclude camera motion interference, demonstrating significant performance gains over existing methods on FLF2V-obj and VBench. Additionally, two new benchmarks—S2M-DAVIS and S2M-OMB—are released to support future research.

Technology Category

Application Category

📝 Abstract
We present Search2Motion, a training-free framework for object-level motion editing in image-to-video generation. Unlike prior methods requiring trajectories, bounding boxes, masks, or motion fields, Search2Motion adopts target-frame-based control, leveraging first-last-frame motion priors to realize object relocation while preserving scene stability without fine-tuning. Reliable target-frame construction is achieved through semantic-guided object insertion and robust background inpainting. We further show that early-step self-attention maps predict object and camera dynamics, offering interpretable user feedback and motivating ACE-Seed (Attention Consensus for Early-step Seed selection), a lightweight search strategy that improves motion fidelity without look-ahead sampling or external evaluators. Noting that existing benchmarks conflate object and camera motion, we introduce S2M-DAVIS and S2M-OMB for stable-camera, object-only evaluation, alongside FLF2V-obj metrics that isolate object artifacts without requiring ground-truth trajectories. Search2Motion consistently outperforms baselines on FLF2V-obj and VBench.
Problem

Research questions and friction points this paper is trying to address.

object-level motion control
image-to-video generation
training-free
motion editing
video evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free motion control
object-level video generation
attention-consensus search
target-frame-based editing
motion fidelity without fine-tuning
🔎 Similar Papers
No similar papers found.