Omnivision-968M: Vision Language Model with 9x Tokens Reduction for Edge Devices

68 pointsposted 3 days ago
by BUFU

12 Comments

jsjohnst

3 days ago

Need to try this directly before passing judgement, but this can unlock a few project ideas I have if the quality lives up to the examples with this low of resource requirements.

gizajob

3 days ago

Its description of the art piece is so awful.

ImageXav

2 days ago

I thought the same, but the description of the cat picture is pretty spot on. I wonder if this is a dataset issue. Cat pictures are far more prevalent than abstract art on the internet so might well be overrepresented. Can Vision LLMs deal with a long tail of underrepresented objects when small? Or can they only do so at scale?

throwaway314155

3 days ago

Can GitHub please acquire all these model-hub companies like fal, replicate, ollama, hf, and checks notes "nexa.ai"? That way we can get past the inevitable fragmentation and ultimate breaking of everyone's workflow w.r.t. ML-oriented dev ops?

gessha

2 days ago

When faced with a diversity of implantation, why is the goto “let’s have a corporate entity acquire them all” instead of “let’s come up with a good runtime standard”. The company is going to do the same thing anyway except with the additional risk of messing up the API and throwing away the hard work of so many people.

croes

3 days ago

You want everything under the control of Microsoft?