hackernews client

FLUX.2 [Klein]: Towards Interactive Visual Intelligence

227 pointsposted 22 days ago

58 Comments

vunderba

21 days ago

I haven’t gotten around to adding Klein to my GenAI Showdown site yet, but if it’s anything like Z-Image Turbo, it should perform extremely well.

For reference, Z-Image Turbo scored 4 out of 15 points on GenAI Showdown. I’m aware that doesn’t sound like much, but given that one of the largest models, Flux.2 (32b), only managed to outscore ZiT (a 6b model) by a single point and is significantly heavier-weight, that’s still damn impressive.

Local model comparisons only:

https://genai-showdown.specr.net/?models=fd,hd,kd,qi,f2d,zt

BoredPositron

21 days ago

I think it shows problems with your tests tbh. The bigger models are way more capable than you make them out to be. They are also better in training and understanding of CGI render outputs as reference like normal maps or id-masks. Your testing suite is the perfect example that structured data implies false confidence. Pure t2i is not a good benchmark anymore.

vunderba

21 days ago

Thanks for the feedback.

> The bigger models are way more capable than you make them out to be.

No test suite is ever going to be perfect. GenAI Showdown was started with the goal of focusing on a very narrow spectrum of testing (prompt adherence) because as a creator that's the one of the most interest to me.

> Pure t2i is not a good benchmark anymore

Just FYI Image Editing is already a separate benchmark (see the navbar at the top).

> Your testing suite is the perfect example that structured data implies false confidence

Again - the headline is "Specific prompts and challenges with a strong emphasis placed on adherence". If I tried to capture every possible aspect of GenAI models (multimodal, texture maps, periodic motion, tiling, etc) - I'd be at it until the heat death of the universe.

Incidentally - which model (specifically) do you think is ranked unfairly? While Flux.2 [dev] did only score a single point above ZiT, it's weighted score is much higher (1442 points vs 911 points).

Bombthecat

21 days ago

Can you fix the information bubble on mobile please? When pressing one, it vanishes instantly...

vunderba

21 days ago

Hey Bombthecat, sorry about that! I can't repro this issue on any of the devices I have (Android Pixel 7, an iPad, etc).

If you get a chance, could you list your mobile device specs? That way I can at least try it on Browserstack and see if I can figure out a fix.

Bombthecat

21 days ago

Samsung, brave browser

Update: Huh, now it's working

kennyadam

21 days ago

Yeah works fine for me on a Pixel 9.

codezero

22 days ago

I am amazed, though not entirely surprised, that these models keep getting smaller while the quality and effectiveness increases. z image turbo is wild, I'm looking forward to trying this one out.

An older thread on this has a lot of comments: https://news.ycombinator.com/item?id=46046916

roenxi

22 days ago

There are probably some more subtle tipping points that small models hit too. One of the challenges of a 100GB model is that there is non-trivial difficulty in downloading and running the thing that a 4GB model doesn't face. At 4GB I think it might be reasonable to assume that most devs can just try it and see what it does.

AuryGlenz

21 days ago

Quality is increasing, but these small models have very little knowledge compared to their big brothers (Qwen Image/Full size Flux 2). As in characters, artists, specific items, etc.

vunderba

21 days ago

Agreed - given what Tongyi-MAI Lab was able to accomplish with a 6b model - I would love to see what they could do with something larger. Somewhere in the range of 15-20b, between these smaller models (ZiT, Klein) and the significantly larger models (Flux.2 dev).

efskap

21 days ago

I smell the bias-variance tradeoff. By underfitting more, they get closer to the degenerate case of a model that only knows one perfect photo.

littlestymaar

21 days ago

That's what LoRAs are for.

And small models are also much easier to fine tune than large ones.

AuryGlenz

20 days ago

I hate that excuse. I want the model to know who the Paw Patrol is without either finding a lora (which probably won't exist because they're mostly porn) or needing to make a dataset, tag it, and then train it myself.

aitchnyu

21 days ago

Is there a theoritical minimum for params for a given output? I saw news about GPT 3.5, then Deepseek training models at a fraction of that cost, then laptops running a model that beats 3.5. When does it stop?

psubocz

22 days ago

> FLUX.2 [klein] 4B The fastest variant in the Klein family. Built for interactive applications, real-time previews, and latency-critical production use cases.

I wonder what kind of use cases could be "latency-critical production use cases"?

satvikpendem

22 days ago

Local models. I'm not gonna wait 10 min for one image on my computer like I did back in the Stable Diffusion days. And image editing in particular.

drellybochelly

22 days ago

Maybe fast image editing, since it supports that.

pajtai

22 days ago

It cannot create an image of a pogo stick.

I was trying to get it to create an image of a tiger jumping on a pogo stick, which is way beyond its capabilities, but it cannot create an image of a pogo stick in isolation.

CamperBob2

21 days ago

Those are both good benchmark prompts. Z-Image Turbo doesn't like them either:

Tiger on pogo stick: https://i.imgur.com/lnGfbjy.jpeg

Dunno what this is, but it's not a pogo stick: https://i.imgur.com/OmMiLzQ.jpeg

Nano Banana Pro FTW: https://i.imgur.com/6B7VBR9.jpeg

nomel

21 days ago

When given an image of an empty wine glass, it can't fill it to the brim with wine. The pogo stick drawers and wine glass fillers can enjoy their job security for months to come!

downboots

21 days ago

You can still taste wine in the metaverse with the mouth adapter and can get a buzz by gently electrifying your neuralink (time travel required)

vunderba

21 days ago

It's a tough test for local models - (gpt-image and NB had zero problems) - the only one that came reasonably close was Qwen-Image

Z-Image / Flux 2 / Hidream / Omnigen2 / Qwen Samples:

https://imgur.com/a/tB6YUSu

This is where smaller models are just going to be more constrained and will require additional prompting to coax out the physical description of a "pogo stick". I had similar issues when generating Alexander the Great leading a charge on a hippity-hop / space hopper.

mhl47

21 days ago

You are right, just tried even with reference images it can't do it for me. Maybe with some good prompting.

Because in theory I would say that knowledge is something that does not have to be baked in the model but could be added using reference images if the model is capable enough to reason about them.

pavelstoev

22 days ago

If we think of GenAI models as a compression implementation. Generally, text compresses extremely well. Images and video do not. Yet state-of-the-art text-to-image and text-to-video models are often much smaller (in parameter count) than large language models like Llama-3. Maybe vision models are small because we’re not actually compressing very much of the visual world. The training data covers a narrow, human-biased manifold of common scenes, objects, and styles. The combinatorial space of visual reality remains largely unexplored. I am looking towards what else is out there outside of the human-biased manifold.

CamperBob2

22 days ago

Images and video compress vastly better than text. You're lucky to get 4:1 to 6:1 compression of text (1), while the best perceptual codecs for static images are typically visually lossless at 10:1 and still look great at 20:1 or higher. Video compression is much better still due to temporal coherence.

1: Although it looks like the current Hutter competition leader is closer to 9:1, which I didn't realize. Pretty awesome by historical standards.

murderfs

22 days ago

> Generally, text compresses extremely well. Images and video do not.

Is that actually true? I'm not sure it's fair to compare lossless compression ratios of text (abstract, noiseless) to images and video that innately have random sampling noise. If you look at humanly indistinguishable compression, I'd expect that you'd see far better compression ratios for lossy image and video compression than lossless text.

regularfry

21 days ago

The comparison makes sense in what I am charitably assuming is the case the GP is referring to: we know how to build a tight embedding space from a text corpus, and get out outputs from it tolerably similar to the inputs for the purposes they're put to. That is lossy compression, just not in the sense anyone talking about conventional lossless text compression algorithms would use the words. I'm not sure we can say the same of image embeddings.

stkdump

21 days ago

I find it likely that we are still missing a few major efficiency tricks with LLMs. But I would also not underestimate the amount of implicit knowledge and skill an LLM is expected to carry on a meta level.

SV_BubbleTime

22 days ago

Flux2 Klein isn’t some generation leap or anything. It’s good, but let’s be honest, this is an ad.

What will be really interesting to me is the release of Z-image, if that goes the way it’s looking, it’ll be natural language SDXL 2.0, which seems to be what people really want.

Releasing the Turbo/Distilled/Finetune months ago was a genius move really. It hurt Flux and Qwen releases on a possible future implication alone.

If this was intentional, I can’t think of the last time I saw such shrewd marketing.

user34283

21 days ago

The team behind Z-Image Turbo has told us multiple times in their paper that the output quality of the Turbo model is superior to the larger base model.

I think that information still did not get through to most users.

"Notably, the resulting distilled model not only matches the original multi-step teacher but even surpasses it in terms of photorealism and visual impact."

"It achieves 8-step inference that is not only indistinguishable from the 100-step teacher but frequently surpasses it in perceived quality and aesthetic appeal"

https://arxiv.org/abs/2511.22699

BoredPositron

21 days ago

It's important for finetuning, Lora training and as a refiner...

user34283

21 days ago

I also heard so, that it would mainly be useful for training and applying the resulting Lora to the distilled Turbo model.

However, I wonder what has been the source of the delay with its release and if there were problems with that approach.

refulgentis

22 days ago

I’m a bit confused, both you and another commenter mention something called Z-Image, presumably another Flux model?

Your frame of it is speculative, i.e. it is forthcoming. Theirs is present tense. Could I trouble you to give us plebes some more context? :)

ex. Parsed as is, and avoiding the general confusion if you’re unfamiliar, it is unclear how one can observe “the way it is looking”, especially if turbo was released months ago and there is some other model that is unreleased. Chose to bother you because the others comment was less focused on lab on lab strategy.

ollin

22 days ago

Z-Image is another open-weight image-generation model by Alibaba [1]. Z-Image Turbo was released around the same time as (non-Klein) FLUX.2 and received generally warmer community response [2] since Z-image Turbo was faster, also high-quality, and reportedly better at generating NSFW material. The base (non-Turbo) version of Z-Image is not yet released.

[1] https://tongyi-mai.github.io/Z-Image-blog/

[2] https://www.reddit.com/r/StableDiffusion/comments/1p9uu69/no...

AuryGlenz

21 days ago

Z-Image is roughly as censored as Flux 2, from my very limited testing. It got popular because Flux 2 is just really big and slow. It is, however, great at editing, has an amazing breadth of built in knowledge, and has great prompt adherence.

Z Image got popular because the people stuck with 12GB video cards could still use it, and hell - probably train on it, at least once the base version comes out. I think most people disparaging Flux 2 never tried it as they wouldn't want to deal with how slowly it would work on their system, if they even realize that they could run it.

refulgentis

22 days ago

Ahh I see, and Klein is basically a response to Z-Image Turbo, i.e. another 4-8B sized model that fits comfortably on a consumer GPU.

It’ll be interesting to see how the NSFW catering plays out for the Chinese labs. I was joking a couple months ago to someone that Seedream 4’s talents at undressing was an attempt to sow discord and it was interesting it flew under the radar.

Post-Grok going full gooner pedo, I wonder if it Grok will take the heat alone moving forward.

CamperBob2

22 days ago

They are underselling Z-Image Turbo somewhat. It's arguably the best overall model for local image generation for several reasons including prompt adherence, overall output quality and realism, and freedom from censorship, even though it's also one of the smallest at 6B parameters.

ZIT is not far short of revolutionary. It is kind of surreal to contemplate how much high-quality imagery can be extracted from a model that fits on a single DVD and runs extremely quickly on consumer-grade GPUs.

AuryGlenz

21 days ago

Hold on now. Z-Image Turbo has gotten a lot of hype but it's worse at all of those things other than perhaps looking like it was shot on a cell phone camera than Qwen Image and Flux 2 (the full sized version). Once you get away from photographic portraits of people it quickly shows just how little it can do.

It is, however, small and quick.

CamperBob2

21 days ago

Not in my experience. Flux 2 is much larger and heavily censored, and Qwen-Image is just plain not as good. You can fool me into thinking that Z-Image Turbo output isn't AI, while that's rarely the case with Qwen.

Look at the images I posted elsewhere in this section. They are crappy excuses for pogo sticks, but they absolutely do NOT look like they came from a cell phone.

Also see vunderba's page at https://genai-showdown.specr.net/ . Even when Z-Image Turbo fails a test, it still looks great most of the time.

Edit re: your other comment -- don't make the mistake of confusing censorship with lack of training data. Z-Image will try to render whatever you ask for, but at the end of the day it's a very small model that will fail once you start asking for things it simply wasn't trained on. They didn't train it with much NSFW material, so it has some rather... unorthodox anatomical ideas.

SV_BubbleTime

22 days ago

Everything you said is exactly the truth.

However.. I’m already expecting the blowback when a Z-Image release doesn’t wow people like the Turbo finetune does. SDXL hasn’t been out two years yet, seems like a decade.

We’ll see. I’m hopeful that Z works as expected and sets the new watermark. I just am not sure it does it right out the gate.

SV_BubbleTime

22 days ago

>Post-Grok going full gooner pedo

Almost afraid to ask, but anytime grok or x or musk comes up I am never sure if there is some reality based thing, or some “I just need to hate this” thing. Sometimes they’re the same thing, other times they aren’t.

I can guess here that because Grok likely uses WAN that someone wrote some gross prompts and then pretended this is an issue unique to Grok for effect?

wmf

22 days ago

A few days ago people were replying to every image on Twitter saying "Grok, put him/her/it in a bikini" and Grok would just do it. It was minimum effort, maximum damage trolling and people loved it.

SV_BubbleTime

22 days ago

Ah. So, see, this is exactly why I need to check apparently.

Personally, I go between “I don’t care at all” and “well it’s not ideal” on AI generations. It’s already too late, but the barrier of entry is a lot lower than it was.

But I’m applying a good faith argument where GP does not seem to have intended one.

refulgentis

22 days ago

Reducing it to some people put people in bikinis for a couple days for the lulz is...not quite what happened.

You may note I am no shirking violet, nor do I lack perspective, as evidenced by my notes on Seedream. And fortuitiously, I only mentioned it before being dismissed as bad faith: I could not have foreseen needing to call out as credentials until now.

I don't think it's kind to accuse others of bad faith, as evidence by me not passing judgement on the person you are replying to's description.

I do admit it made my stomach churn a little bit to see how quickly people will other. Not on you, I'm sure I've done this too. It's stark when you're on the other side of it.

refulgentis

22 days ago

Nah it's been happening for months and involved kids, over and over, albeit for the same reasoning, lulz & totally based. I am a bit surprised that you thought this was just a PG-rated stunt on X for a couple days, it's been in the news for weeks, including on HN.

SV_BubbleTime

21 days ago

I see absolutely no citations. Can you point to anything that shows a specific Grok issue vs generally people doing icky things with photo generation software?

Because, as I remember you said “post-pedo Grok”.

refulgentis

21 days ago

You can Google whatever you need yourself at this point, you told the world I was operating in bad faith based off one sentence from a stranger. You ignored my reply to you. And now you are engaging with me on another reply as if my claim was Grok is uniquely capable of this, when I in fact said the opposite, and the interesting part of the discussion was me pointing out all can do this. Have a good day!

SV_BubbleTime

21 days ago

“Post-pedo grok”

Just admit you’re very accustomed to shitting on x, grok, whatever Musk is associated with as a reinforcement to your political ideology.

Your comments weren’t about AI, thy were about Grok, and then you were incapable of defending that claim.

refulgentis

21 days ago

I am of no party or clique, why would Elon be doing moderation anyway? He has better things to do. If anything, sounded understaffed and thus taken advantage of by ne’er do wells - you can check if I’m pivoting by noting I noted in my original post every model can do this and Grok being focused on was a strange aberration.

I feel pathetic defending myself to someone who keeps reading my mind in the blandest way possible, then accuses me of wrongthought I must have had, based on things I never said. Hard to believe you’re living up to your ideals in this moment if you’re a fellow advocate for truth seekers and great men. I respect interlocution, but not repeated personal attacks based on thoughts projected and things unsaid. That’s not truth seeking behavior.

wolvoleo

20 days ago

Ohh I thought this was about the tool that makes your screen more Orange at night. No longer necessary as every desktop has that built in. But it was also called F.Lux

Yash16

21 days ago

Has anyone tried Flux 2 Klein? Lately, I’ve stopped chasing new models. I’m genuinely happy with Nano Banana Pro—the results are solid, so I’m building the entire app around just this one model.

https://picxstudio.com

Mashimo

21 days ago

Neat, I really enjoyed flux 1. Currently use z image turbo for messing around.

I will wait for invoke to add flux2 klein.

good competition breed innovation