tty456
7 hours ago
I don't get the comments trashing this. If it slightly beats or even matches Opus 4.6, it means Meta is capable of building a model competitive with the leading AI company. Sure, they spent a lot of money and will have on-going costs. But how much more work would it take to turn that into a coding agent people are willing to try (and pay for) along side their usage of a collection of agents (Claude, Codex, etc)? Also means Meta doesn't have to pay another company to use a SATA model across all their products (including IG and WhatsApp, vr) which will matter to their balance sheet long term (despite the constant r&d spend).
prodigycorp
7 hours ago
Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.
zozbot234
7 hours ago
The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.
dilap
6 hours ago
Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.
jychang
an hour ago
Also, Gemini 2.5 Pro launched a week before Llama 4.
It was Gemini 2.5 Pro that redeemed Google in the eyes of most people as a valid competitor to OpenAI instead of as a joke, so Meta dropping the ball with Llama 4 was extra bad.
prodigycorp
6 hours ago
the models were objectively horrible
NitpickLawyer
6 hours ago
They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.
refulgentis
6 hours ago
Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.
Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.
Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.
prodigycorp
6 hours ago
Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.
pixel_popping
3 hours ago
failing non-stop at tool calls on top of that.
refulgentis
6 hours ago
I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.
They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).
Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.
They shouldn't have released it on a Saturday.
They should have spent a month with it in private prerelease, working with providers.[1]
The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"
I bet it was super fucking annoying to talk to due to LMArena maxxing.
[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.
alex1138
4 hours ago
Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit
refulgentis
2 hours ago
I don't know how Zuck intervening could change float32s in a trained model, so I don't think I think that, but maybe I'm parsing your words incorrectly.
canes123456
30 minutes ago
Why go into coding agents? Both anthropic and OpenAI are going all in on that. The opportunity is customer facing AI now.
OpenAI has the mindshare but they going to have to decide if they allocate their limited compute for free users or go all in trying to keep up with Anthropic in enterprise.
modeless
5 hours ago
It's a decent model if the benchmarks are to be believed, but it won't be close to Opus in usefulness for programming. None of these benchmarks completely capture what makes a model useful for day-to-day coding tasks, unfortunately. It will take time for them to catch up, and Opus will keep improving in the meantime. But it's good to have more competition.
ai5iq
14 minutes ago
Benchmarks miss the thing that actually matters for agentic use: how does behavior change over a multi-day horizon? A model that scores well on one-shot coding tasks can still make terrible decisions when it has persistent state and resource constraints. That's where you see the real gaps between models.
ChipopLeMoral
6 hours ago
> I don't get the comments trashing this.
People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.
redox99
7 hours ago
> If it slightly beats or even matches Opus 4.6
It doesn't though
ryeguy_24
6 hours ago
Curious on why you think this. Any data points that led you to this?
howdareme
6 hours ago
The benchmarks they released
johnfn
4 hours ago
What do you mean? In most cases, the benchmarks show a larger number for Muse and a smaller number for Opus.
spprashant
4 hours ago
In Multimodal yes, but Opus is definitely edging out in Text/Reasoning and Agentic benchmarks.
I think the general skepticism is because they are late to race, and they are releasing a Opus-4.6-equivalent model now, when Anthropic is teasing Mythos.
blazespin
3 hours ago
Because bots and trillion dollar ipos and even bigger stakes. People need to better appreciate the level of manipulation going on. Social media has an outsized impact. Bots and even people are getting paid to post and upvote/downvote narratives.
asdfman123
2 hours ago
> people are getting paid to post and upvote/downvote narratives
This problem will be solved shortly with better AI (if it hasn't essentially been solved already).
No more humans in the loop, much lower costs for social media manipulation. Welcome to the future!