qantrell
10 hours ago
Our new architecture analyzes and decomposes images into a code-like intermediate representation called layouts — an internal visual language that captures the composition and structure of any image.
Our intermediate image representation is designed for transparency and control. Rather than hiding the model's understanding behind a black box, we expose its internal representation, or code, to enable direct manipulation of visual elements. Users can now move, resize, add, remove or replace objects with granular control.