Layered Image Vectorization via Semantic Simplification

1Shenzhen University, 2Tel-Aviv University

Layered vectorization: by generating a sequence of semantically simplified images (top row), our technique constructs vectors with progressively finer levels of detail (second row from the top). This process generates visual primitives layer by layer (third row), keeping the primitives compactly within the boundaries of semantic groups (bottom row).


This work presents a novel progressive image vectorization technique aimed at generating layered vectors that represent the original image from coarse to fine detail levels. Our approach introduces semantic simplification, which combines Score Distillation Sampling and semantic segmentation to iteratively simplify the input image. Subsequently, our method optimizes the vector layers for each of the progressively simplified images. Our method provides robust optimization, which avoids local minima and enables adjustable detail levels in the final output. The layered, compact vector representation enhances usability for further editing and modification. Comparative analysis with conventional vectorization methods demonstrates our technique’s superiority in producing vectors with high visual fidelity, and more importantly, maintaining vector compactness and manageability.

Animation of Our Vectorization

More Results >>

How Does it Work

Our framework consists of two modules that work sequentially to achieve layered vectorization. One module is Progressive Image Simplification that employs Score Distillation Sampling combined with semantic segmentation to generate a sequence of images with varying levels of semantic simplicity. The other module Layered Vectorization uses that series of simplified images as guidance and generates vectors layer by layer, from coarse to fine level of detail.


For each example in each row, our method takes the original image (the rightmost one) as input, an generates the vectors layer by layer, from the coarse to the fine details (from left to right).

Comparisons to Prior Work

Gallery of examples generated by three image vectorization techniques, i.e., LIVE, O&R and ours. Primitives in LIVE or O&R tend to exhibit scattering and inter-region intersections. In contrast, our method demonstrates superior performance by effectively retaining nearly half of the visual primitives within their designated semantic structures. Also, our method achieved average higher visual fidelity (measured by pixel-level MSE loss).

More Results