mishu2
a month ago
Having the ability to do real-time video generation on a single workstation GPU is mind blowing.
I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.
iberator
a month ago
Computer games have been doing it for decades already.
echelon
a month ago
I think video-based world models like Genie 2 will happen and that they'll be shrunken down for consumer hardware (the only place they're practical).
They'll have player input controls, obviously, but they'll also be fed ControlNets for things like level layout, enemy placement, and game loop events. This will make them highly controllable and persistent.
When that happens, and when it gets good, it'll take over as the dominant type of game "engine".
qingcharles
a month ago
I don't know how much they can be shrunk down for consumer hardware right now (though I'm hopeful), but in the near-term it'll probably all be done in the cloud and streamed as it is now. People are playing streamed video games and eating the lag, so they'll probably do it for this too, for now.
ragequittah
a month ago
This is also the VR killer app.
cess11
a month ago
Are you sure it's not just polish on the porn that is already the "VR killer app"?
arghwhat
a month ago
A very, very different mechanism that "just" displays the scene as the author explictly and manually drew it, and yet has to pull an ungodly amount of hacks to make that viable and fast enough, resulting in a far from realistic rendition...
This on the other hand happily pretends to match any kind of realism requested like a skilled painter would, with the tradeoff mainly being control and artistic errors.
echelon
a month ago
> with the tradeoff mainly being control and artistic errors.
For now. We're not even a decade in with this tech, and look how far we've come in the last year alone with Veo 3, Sora 2, and Kling 4x, and Kling O1. Not to mention the editing models like Qwen Edit and Nano Banana!
This is going to be serious tech soon.
I think vision is easier than "intelligence". In essence, we solved it in closed form sixty years ago.
We have many formulations of algorithms and pipelines. Not just for the real physics, but also tons of different hacks to account for hardware limitations.
We understand optics in a way we don't understand intelligence.
Furthermore, evolution keeps evolving vision over and over. It's fast and highly detailed. It must be correspondingly simple.
We're going to optimize the shit out of this. In a decade we'll probably have perfectly consistent Holodecks.
justinclift
a month ago
Hmmm, future video's might just "compress" down to a common AI model and a bunch of prompts + metadata about scene order. ;)
arghwhat
a month ago
I feel like this misses the point. Also, vision and image generation are entirely different things. Even for humans, with some people not being able to create images in their head despite having perfectly good vision.
Understanding optics instead of intelligence speaks to the traditional render workflow, a pure simulation of input data with no "creative processes". Either the massive hack that is traditional game render pipelines, or proper light simulation. We'll probably eventually get to the point where we can have full-scene, real-time ray-tracing.
The AI image generation approach is the "intelligence" approach where you throw all optics, physics and render knowledge up in the air and let the model "paint" according to how it imagines the scene, like handing a pencil to a cartoon/anime artist. Zero simulation, zero physics, zero roles - just the imagination of a black box.
No light, physics or existing render pipeline tricks are relevant. If that's what you want, you're looking for entirely new tricks: Tricks to ensure object permanence, attention to detail (no variable finger counts), and inference performance. Even if we have it running in real-time, giving up your control and definition of consistency is part of the deal when you hand off the role of artist to the box.
If you want AI in the simulation approach you'll be taking an entirely different path, skipping any involvement in rendering/image creation and instead just letting the model pupetteer the scene within some physics restraints. Makes for cool games, but completely unrelated to the technology being discussed.
user
a month ago
nkmnz
a month ago
Bob Ross did it, too.
pwython
a month ago
1 frame of Bob Ross = 1,800s
ash_091
a month ago
So with 108,000 (60 X 1,800) Bob Ross PPUs (parallel painting units) we should be able to achieve a stable 60FPS!
mishu2
a month ago
Once you set up a pipeline, sure. They'd need a lot of bandwidth to ensure the combined output makes any kind of sense, not unlike the GPU I guess.
Otherwise it's similar to the way nine women can make a baby in a month. :)
justinclift
a month ago
The food/housing/etc bill for 108k Bob Ross er... PPU's seems like it would be fairly substantial too.