Layered Image Vectorization via Semantic Simplification

Zhenyu Wang¹, Jianxi Huang¹, Zhida Sun¹, Yuanhao Gong¹, Daniel Cohen-Or², Min Lu¹

¹Shenzhen University, ²Tel-Aviv University

Layered vectorization: by generating a sequence of progressive simplified images (top row), our technique reconstructs vectors layer by layer, from macro to finer details (middle row). Our approach maintains the vectors compactly aligned within the boundaries of explicit and implicit semantic objects (bottom row).

Abstract

This work presents a progressive image vectorization technique that reconstructs the raster image as layer-wise vectors from semantic-aligned macro structures to finer details. Our approach introduces a new image simplification method leveraging the feature-average effect in the Score Distillation Sampling mechanism, achieving effective visual abstraction from the detailed to coarse. Guided by the sequence of progressive simplified images, we propose a two-stage vectorization process of structural buildup and visual refinement, constructing the vectors in an organized and manageable manner. The resulting vectors are layered and well-aligned with the target image's explicit and implicit semantic structures. Our method demonstrates high performance across a wide range of images. Comparative analysis with existing vectorization methods highlights our technique's superiority in creating vectors with high visual fidelity, and more importantly, achieving higher semantic alignment and more compact layered representation.

Animation of Our Vectorization

More Results >>

How Does it Work

Layered vectorization pipeline: with the input of a target image, its sequence of progressive simplified images is generated using the SDS diffusion model. Vectors are then reconstructed in two stages: structure construction via layer-wise shape optimization to match segmented masks and visual refinement for high fidelity.

Results

Vectors with levels of detail generated with our method: from left to right, vector primitives from macro to finer details are added layer by layer.

Comparisons to Prior Work: Visual Quality

Qualitative reconstruction comparison: both examples are vectorized with 128 vector primitives. Our method reconstructs more faithful, clean, and semantic-aligned vectors.

Comparisons to Prior Work: Layer-wise Representation

Vector layers generated by LIVE, DiffVG, O&R, SGLIVE, and our method.

Semantic Alignment

Captioning of macro structural vectors generated by our vectorization method: for each example, the caption of the coarse image is generated by Florence-2 model.