UniK3D: Universal Camera Monocular 3D Estimation

46 pointsposted 10 months ago
by rbanffy

12 Comments

echelon

10 months ago

There's enough crazy stuff in image and video happening now that would have turned heads at SIGGRAPH just a few years ago. It feels like we'll have fully programmable and deformable visual worlds soon.

ttoinou

10 months ago

Can’t wait enough for that. But there are so many new methods and papers coming out every week I dont see how tooling can become stable enough to have a unified workflow that doesnt need a clean slate takeover every 6 months

echelon

10 months ago

> I dont see how tooling can become stable enough to have a unified workflow that doesnt need a clean slate takeover every 6 months

That's a good thing. There's so much opportunity to disrupt incumbents now.

Build for what we have, but don't overfit. Be flexible for what's coming next.

Calwestjobs

10 months ago

ETH is insane last few years, last year literally dirt cheap literally lossless seasonal energy storage : https://ethz.ch/en/news-and-events/eth-news/news/2024/08/iro...

now they have better 3D perception algo than Tesla after 10 years of Musks fraudulent ai claims...

MrLeap

10 months ago

There's quite a few monocular depth estimation models out there, have been for years. This one looks pretty good. That said, the temporal stability seems pretty wobbly, I don't think I'd use it for a self driving car.

The most impressive example was the point cloud they generated from the extreme fisheye lens, that was nice.

Predicting that the background on cloud city was a flat matte painting is also impressive in a way. It does seem to collapse all far field objects into a single plane. That's a decent compromise for many things.

Calwestjobs

10 months ago

tesla can not see paper box in my garage.

jteppinette

10 months ago

Would it be possible to "auto" layer the outputs from multiple images?

michaelt

10 months ago

I've seen tools for Generative AI art which do something a lot like that [1]

Generate an image of subject A, apply a depth estimation ML model to figure out which pixels belong to the foreground and cut them out.

Then generate an image of background B with a different prompt, and paste the subject from the first step on top. A sort of automatic collage making process.

[1] https://github.com/Extraltodeus/multi-subject-render

Calwestjobs

10 months ago

when it is 3d then it can be trivially converted to point cloud (if it is not exportable as point cloud already) and then you can scale, overlay, splice, merge anything with anything trivially. game over !

eu is having plans and making steps for having main highway infrastructure ready for autonomous vehicles, they are changing colors on road signs for years (vertical and horizontal) so cars have higher probability of decoding it, introducing C-ITS, ... This can be another great addition to get us there safely and not by listening to AI nonsense bros. while europe is trying to build stuff while being attacked by russians in more ways than one, https://www.theguardian.com/world/2024/dec/04/up-to-100-susp... US "elites" are doing nothing just talking about impending doom and buying land on Hawaiian islands or new zealand for apocalyptic shelters....

ashoeafoot

10 months ago

Point louds do not load on firefox mobile