We are excited to share that our paper “LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing” has been accepted as an Oral Presentation at ICCV 2025!
Congratulations to Federico Girella, Davide Talon, Ziyue Liu, Zanxi Ruan, Yiming Wang, and Marco Cristani on this outstanding achievement! 👏
LOTS, short for LOcalized Text and Sketch, introduces a new framework for controllable fashion image generation. The method enables users to generate complete fashion outfits by combining a global description with multiple localized sketch-text pairs, allowing each garment item to be controlled individually in terms of both spatial layout and visual appearance.

Fashion design naturally relies on both visual and textual expressions: sketches define the shape, structure, and layout of garments, while language descriptions specify materials, textures, patterns, and stylistic details. LOTS brings these two forms of design communication together, offering an unprecedented level of fine-grained control for fashion image generation. 👗
A key challenge in multi-condition generation is attribute confusion, where attributes intended for one garment may incorrectly appear on another — for example, a floral pattern meant for a top may leak into shorts or jackets. LOTS addresses this issue through a Modularized Pair-Centric Representation, which independently encodes each localized sketch-text pair, and a Diffusion Pair Guidance strategy, which injects these localized conditions into the diffusion model during the multi-step denoising process.

Instead of merging all conditions into a single global representation, LOTS defers the fusion of multiple sketch-text pairs to the diffusion process itself. This design allows the model to better preserve localized garment attributes, reduce information leakage across items, and generate more coherent fashion images with accurate layout and appearance control.
To support this new task, the authors introduce Sketchy, a new dataset built on Fashionpedia. Sketchy provides fashion images with multiple garment-level sketch-text pairs, enabling training and evaluation for localized sketch-text based image generation. This dataset establishes a new benchmark for fine-grained controllable fashion generation.
Extensive experiments show that LOTS achieves state-of-the-art performance in both global image quality and localized attribute alignment. Quantitative results, qualitative comparisons, and human evaluation demonstrate that LOTS better preserves garment-specific details, improves attribute localization, and mitigates attribute confusion compared with existing sketch-to-image and multi-conditioning methods.
This work represents an exciting step forward for controllable image generation, fashion design AI, and multimodal generative modeling, providing a powerful tool for transforming localized creative ideas into coherent visual outputs.
🔗 Project Page: https://intelligolabs.github.io/lots
Great work by the team! 👏✨

