Hackernews
new
show
ask
jobs
Text or pixels? On the token efficiency of visual text inputs in multimodal LLMs
2 points
posted 19 hours ago
by hhs
(arxiv.org)
No comments yet