🤖 AI Summary
This work addresses the challenge of accurately modeling soft boundaries—such as hair or motion blur—in stereo conversion, where foreground-background blending leads to ambiguous depth correspondences that conventional single-layer depth estimation fails to capture. To overcome this limitation, the authors propose αDepth, a method that jointly estimates color and depth through a layered representation in a single forward pass. Central to their approach is the novel Cyclic Alpha Representation (CAR), which reformulates soft-boundary modeling from a global object-centric task into a local boundary decomposition process. This enables unsupervised, high-quality layered inference even in multi-object scenes. The method effectively mitigates background color bleeding and structural distortions at soft boundaries, significantly enhancing visual fidelity in stereo conversion tasks.
📝 Abstract
Accurately modeling soft boundaries, e.g., hair and defocus blur, is a fundamental challenge in stereo conversion due to the ambiguous blending of foreground and background. Existing depth models primarily predict single-layer depth, leading to ambiguity in depth correspondence at soft boundaries. While matting techniques can capture opacity for layered modeling, they often struggle in complex scenes with multiple targets and usually require user intervention. This paper introduces αDepth, a layered representation that decomposes soft boundaries for high-fidelity stereo conversion. Specifically, we first resolve mixed color and depth ambiguity by estimating layered color and depth values at soft boundaries. Considering complex multi-target scenes, we design a Circular Alpha Representation (CAR) that shifts the paradigm from global target extraction to local boundary decomposition. Unlike prior matting methods restricted to a single foreground/background, CAR enables efficient scene-level inference without manual guidance. Extensive evaluations demonstrate that αDepth achieves state-of-the-art performance in stereo conversion, eliminating background bleeding and structural distortions at soft boundaries.