scosman
5 days ago
Context: someone announced a Llama 3.1 70B fine tune with incredible benchmark results a few days ago. It's been a dramatic ride:
- The weight releases were messed up: released Lora for Llama 3.0, claiming it was a 3.1 fine tune
- Evals initially didn't meet expectations when run on released weights
- The evals starting performing near/at SOTA when using a hosted endpoint
- Folks are finding clever ways to see what model is running on the endpoint (using model specific tokens, and model specific censoring). This post claims there's proof it's not running on their model, but just a prompt on Sonnet 3.5
- After it was caught and posted as being Sonnet, it stop reproducing. Then others in the thread claimed to find evidence he just switched the hosted model to GPT 4o using similar techniques.
Lots of mixed results, inconsistent repos, and general confusion from the bad weight releases. Lots of wasted time. Not clear what's true and what's not.
ga6840
5 days ago
Who is Sahil Chaudhary? Why he doesn't announce such a great advancement himself? Why Matt Shumer first announces it only because -- according to a later claim on X.com -- he trusted Sahil, does that mean Matt is unable to participate most of the progress? Then why announce a breakthrough without mentioning he was not fully involved to a level he can verify the result in the first place?
jazzyjackson
5 days ago
One more reason not to pay attention to things that only seem to exist on x.com
numpad0
4 days ago
I recognize that surname from Twitter spams. Twitter has had financial rebates program for paying accounts for a while, and for months tons of paid spam accounts have been reply squatting trending tweets with garbage. Initially they appeared Sub-Saharan African, but the demographic seem to be constantly shifting eastward from there for some reason, through the Middle East and now around South-Indian/Pakistani regions. This one and variants thereof are common one in the Indian category among those.
Maybe someone got lucky with that and trying their hands at LLM finetuning biz?
sumedh
4 days ago
Matt and Sahil did an interview and it was mostly Matt doing the talking while Sahil looked like a hostage forced by Matt to do the interview.
vertis
4 days ago
As far as I can tell he's the founder of GlaiveAI. There were messages suggesting Matt was an investor, but I haven't been able to confirm this.
czl_my
4 days ago
Matt said it was approximately ”$1000" and that he has disclosed it "before" in a reply. https://x.com/mattshumer_/status/1832558298509275440
GaggiX
5 days ago
When they were using the Sonnet 3.5 API, they censored the word "Claude" and replaced "Anthropic" with "Meta", then later when people realized this, they removed it.
Also, after GPT-4o they switched to a llama checkpoint (probably 405B-inst), so now the tokenizer is in common (no more tokenization trick).
vertis
4 days ago
Yeah I managed to get it to admit that it was Claude without much effort (telling it not to lie), and then it magically stopped doing that. FWIW Constitutional AI is great.
wis
4 days ago
They implemented the censoring of "Claude" and "Anthropic" using the system prompt?
Shouldn't they have used simple text replacement? they can buffer the streaming response on the server and then .replace(/claude/gi, "Llama").replace(/anthropic/gi, "Meta") on the streaming response while streaming it to the client.
Edit: I realized this can be defeated, even when combined with the system prompt censoring approach.
For example when given a prompt like this: tell me a story about a man named Claude...
It would respond with: once upon a time there was a man called Llama...
nacs
4 days ago
> Shouldn't they have used simple text replacement?
They tried that too but had issues.
1) Their search and replace only did it on the first chunk of the returned response from Claude.
2) People started asking questions that had Claude as the answer like "Who composed Clair de lune?" for which the answer is supposed to be "Claude Debussy" which of course got changed to Llama Debussy, etc.
It's been one coverup-fail after another with Matt Shumer and his Reflection scam.
DebtDeflation
4 days ago
I was following the discussion on /r/LocalLlama over the weekend. Even before the news broke that it was Claude not a Llama 3.1 finetune, people had figured out that all Reflection really had was a custom system prompt telling it to check its own work and such.