Hackernews
new
show
ask
jobs
Text or pixels? On the token efficiency of visual text inputs in multimodal LLMs
2 points
posted 3 months ago
by hhs
(arxiv.org)
No comments yet