AMD Unveils Its First Small Language Model AMD-135M

237 pointsposted 14 hours ago
by figomore

66 Comments

diggan

13 hours ago

> The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.

Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!

Available under this funky GitHub organization it seems: https://github.com/AMD-AIG-AIMA/AMD-LLM

wrs

13 hours ago

We (developers and tech managers) really need to hold the line on this terminology. This is a full actual open source LLM. The usual “open inference” model is not.

boulos

13 hours ago

I assume by "open inference" you mostly mean "weights available"?

wrs

13 hours ago

Usually “open source” for an LLM means you get the weights and the inference code, which I’ve started calling “open inference”. It’s certainly good and useful, but it’s not the actual source of the model.

I find people get into silly arguments about the terminology because they’re focused on whether the “source” is “open” and not on what the “source” is actually the source of.

“Weights available” indicates even the weights aren’t “open” in the usual software meaning of the term, as they typically come with restrictive licenses (more restrictive than copyleft or attribution).

nickpsecurity

10 hours ago

I call them open weights or just freeware like when we got only the EXE’s on Windows.

amelius

9 hours ago

Open source is what you would get if an academic institution would release it.

zdragnar

4 hours ago

Aren't academic institutions more likely to claim ownership of anything produced than they are to totally open source something?

wmf

11 hours ago

You're not wrong, but if you come up with a definition that no one is willing to meet you're just making that definition irrelevant.

ok_dad

9 hours ago

Plenty of people publish actual open source software, the definition isn’t the problem, it’s the people who misuse it that are the problem.

wmf

8 hours ago

There's a huge difference between software and AI models. We can debate why that happens but it's a fact. Companies are willing to release open weights but virtually no one is willing to create open source models. Shaming and well actuallying has achieved nothing so far.

wrs

6 hours ago

And I'm not arguing that they should release open source models. There's no shame in releasing an open-inference model. But I think I'm fair in saying they should use an accurate term for what they do release.

yazzku

7 hours ago

There is nothing "source" about the "open source models" that companies typically release. The use of the term "open source" is deliberate marketing BS. If you want to argue there's a difference between software and a model, then don't use software terms that are already well-defined to refer to some property of the model.

https://opensource.org/osd

cassianoleal

17 minutes ago

It’s a lot worse than marketing BS. It’s deliberate misdirection. Essentially a con.

GeekyBear

12 hours ago

> Wow, an actual open source language model (first of its kind

Apple research has previously released another example of a model with open training code, data, and weights, but their model was sized for running inference workloads on mobile devices.

However, Apple has a mobile device line of business and AMD has an enterprise AI accelerator line of business, so they are both doing work relevant to their bottom line.

jerrygenser

13 hours ago

This would be another example of open source. Not from such a large company but a good reference including code, data, weights, etc.

https://allenai.org/olmo

brianjking

9 hours ago

Molmo even more so! The 7b is wild.

kypro

12 hours ago

Smart move from AMD. Helps develop an ecosystem around their tech and for their GPUs.

jeff_carr

7 hours ago

Has anyone tried it? I mean, I would, but as far as I can tell understand I need 4 boxes with 4 GPU's. Plus an interconnect. I mean, I could put in an order for my homelab but at around $80k per box and maybe $20k for the right switches and some other gear, my wife will probably frown at me ordering a $340,000 rig to try this code that I don't know what to do with it if it works.

Maybe some other heavy hitter out there can explain what all this whatchamacallit newfangled synergy producing matrix algebra does after you have it running?

Shadowmist

5 hours ago

> that I don't know what to do with it if it works.

After you get it up and running you can just ask it what to do with it.

bubaumba

13 hours ago

No, it's not open source till someone can actually reproduce it. That's the hardest part. For now it's open weights open dataset. Which is not the same.

diggan

13 hours ago

That's... Not how open source works? The "binary" (model weights) is open source and the "software" (training scripts + data used for training) is open source, this release is a real open source release. Independent reproduction is not needed to call something open source.

Can't believe it's the second time I end up with the very same argument about what open source is today on HN.

bubaumba

13 hours ago

You are missing key points here. "reproduce" means produce the same. Not just train similar model.

I can simplify the task, can you convincingly explain how the same model can be produced from this dataset? We can start simple, how you can possibly get the same weights after the first single iteration? I.e. the same as original model got. Pay attention to randomness, data selection, initial model state.

Ok, if you can't do that. Can you explain in believable way how to prove that given model was trained on give dataset? I'm not asking you for actually doing all these things, that could be expensive, only to explain how it can be done.

Strict 'open source' includes not only open weights, open data. It also includes the word "reproducible". It's not "reproduced", only "reproducible". And even this is not the case here.

Zamiel_Snawley

9 hours ago

If they provide the training code, and data set, how is that not enough to reproduce functionally equivalent weights? I don’t have any experience in the AI field, what else would they need to provide/define?

As others have mentioned, reproducible builds can be quite difficult to achieve even with regular software.

Compiler versions, build system versions, system library versions, time stamps, file paths, and more often contribute to getting non-identical yet functionally equivalent binaries, but the software is still open source.

Sayrus

12 hours ago

Reproducible builds are not a requirement for open source software, why is it one for open source models?

wrs

11 hours ago

I would say that functionally reproducible builds are sort of inherent in the concept of “source”. When builds are “not reproducible” that typically just means they’re not bit-for-bit identical, not that they don’t produce the same output for a given input.

prophesi

8 hours ago

Once neural networks enter the scene, I don't think giving the same output for a given input is possible in the field currently. I believe this is as open as language models can be, and what people mean when they say it's a "fully open source" model.

worewood

12 hours ago

How often do people expect to compile open-source code and get _exactly_ the same binary as the distributed one? I've seen this kind of restriction only on decompilation projects e.g. the SM64 decompilation -- where they deliberately compare the hashes of original vs. compiled binaries, as a way to verify the decompilation is correct.

It's an unreasonable request with ordinary code, even more with ML where very few ones have access to the necessary hardware, and where in practice, it is not deterministic.

e12e

11 hours ago

I expect that if I compile your 3d renderer, and feed it the same scene file you did - I get the same image?

TylerE

6 hours ago

Why would you expect that? 3D renderers are not generally deterministic. Many will incorporate, for instance, noise algorithms. They will frequently not produce byte-identical renders on the same hardware using the same binary.

wrs

13 hours ago

The interesting part of the product we’re taking about (that is, the equivalent of the executable binary of an ordinary software product) is the weights. The “source” is not sufficient to “recompile” the product (i.e., recreate the weights). Therefore, while the source you got is open, you didn’t get all the source to the thing that was supposedly “open source”.

It’s like if I said I open-sourced the Matrix trilogy and only gave you the DVD image and the source to the DVD decoder.

(Edit: Sorry, I replied to the wrong comment. I’m talking primarily about the typical sort of release we see, not this one which is a lot closer to actually open.)

littlestymaar

11 hours ago

> The “source” is not sufficient to “recompile” the product (i.e., recreate the weights). Therefore, while the source you got is open, you didn’t get all the source to the thing that was supposedly “open source”.

What's missing?

wrs

11 hours ago

Well, I’m not experienced in training full-sized LLMs, and it’s conceivable that in this particular case the training process is simple enough that nothing is missing. That would be a rarity, though. But see my edit above — I’m not actually reacting to this release when I say that.

littlestymaar

2 hours ago

OK, so you just like to be a contrarian…

dboreham

13 hours ago

But wouldn't failure to achieve independent reproduction falsify the open claim?

Similar to you publish the source for Oracle (the database), but nobody can build a binary from it because it needs magic compliers or test suites that aren't open source?

Heck when the browser was open-sourced, there was an explicit test where the source was given to some dude who didn't work for Netscape to verify that he could actually make a working binary. It's a scene in the movie "Code Rush".

Jabrov

13 hours ago

What’s the difference?

avaldez_

11 hours ago

Reproducibility? I mean what's the point of an open technology nobody knows if it works or not.

n_ary

13 hours ago

Now this here is the beginning on real innovation of AI. With AMD coming in(albeit late and slowly), meta with LLama improving, we will soon see some real adaptation and development in next few thousand days. At this moment, I see OAI as the yahoo of the pre-Google era.

imjonse

4 hours ago

"next few thousand days"

can we stick to years as a unit of measure and not spread Sam Altman's phrase :)

washadjeffmad

4 hours ago

Twenty two thousand days

Twenty two thousand days

It's not a lot, it's all we got

Twenty two thousand days

- Sam Altman?

highfrequency

12 hours ago

Looks like they are using sixteen $13k GPUs [1] (around $210k hardware) for 6 days of training.

Anyone know the recommended cloud provider and equivalent rental price?

[1] https://www.wiredzone.com/shop/product/10025451-supermicro-g...

layoric

10 hours ago

MI250s definitely aren’t a common card to rent so only can find Runpod at $2.10 per hour each. This results in a training cost of $4838 + fine tuning of $3225. However this doesn’t include the 11TB of storage or time taken to get the setup actually running the tasks. So likely you wouldn’t see much change from $10k usd if any.

- https://www.runpod.io/gpu/mi250

lhl

10 hours ago

Runpod.io rents the next-gen MI300X's for $4/hr, although since they also rent H100's for $3/hr (that are easier to work with/faster for training) it might be more of a novelty.

highfrequency

9 hours ago

I thought the whole selling point of AMD GPUs was that they were a lot cheaper than Nvidia GPUs?

dagmx

5 hours ago

Cheaper for the cloud company. But that doesn’t always translate to cheaper for the end user. Maybe they cost more to run or maybe there’s fewer of them so they’re more expensive to book?

knotimpressed

4 hours ago

At least a couple years ago, a big advantage of Nvidea cards was how much cheaper they were to run power wise-often the dies that made it into cloud level cards would be binned consumer dies.

Not sure if that’s still the case, but I’d say it’s plausible.

lostmsu

4 hours ago

Impossible. Power costs for H100-like cards are dwarfed by the cost of the cards themselves. H100 at full load will consume ~$3500 (rough estimate) of power in 5 years at $0.12/kWh.

wmf

11 hours ago

Hot Aisle seems to the (only?) place to rent AMD. (Ryan, please don't spam this thread. It's not a good look.)

benterix

13 hours ago

I'm happy to see a truly open source model.

Actually, AMD has excellent reasons to make this kind of development and I hope they continue.

luyu_wu

12 hours ago

The section on speculative execution is interesting. "This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements."

Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful.

Very interesting though! I'll be playing around with this on the weekend!

lhl

10 hours ago

Orders of magnitude seems a bit ambitious. The implementation from the DeepMind paper achieved a 2-2.5X https://arxiv.org/pdf/2302.01318 and most of the tests I've seen [1][2] have been similar, but there are different variations (Medusa, Ouroboros, etc) that can do better/be combined. Recently Together.ai published SpecExec, a SD variant which did claim to get a 10-18X speedups: https://www.together.ai/blog/specexec

[1] https://www.reddit.com/r/LocalLLaMA/comments/17h4rqz/specula...

[2] https://arxiv.org/pdf/2402.01528v3

lhl

35 minutes ago

BTW, I got a chance to read through the model card and there's a section that shows their SD gains: https://huggingface.co/amd/AMD-Llama-135m#speculative-decodi...

- 1.75x-2.80x on MI250

- 2.83x-2.98x on NPU

- 3.57x-3.88x on CPU

Note they were testing on AMD-Llama-135m-code as draft model for CodeLlama-7b, both of which do similarly badly on Humaneval Pass@1 (~30%), so it's likely if they were using a similarly trained 135m to SD for say, Qwen2.5-Coder (88.4% on HumanEval), the perf gains would probably be much worse.

craftkiller

13 hours ago

I see multiple mentions of NPU on this page, but its still not clear to me: is this something that can finally use the NPU on my processor?

lhl

10 hours ago

There's actually seems to be a bunch of stuff now:

* https://github.com/amd/RyzenAI-SW - has a list of demos and how to use it directly (including apparently w/ PyTorch and LLMs)

* https://github.com/huggingface/optimum-amd - can use RyzenAI to use the NPU for HF transformers

There's now a Linux driver even https://github.com/amd/xdna-driver although it looks like a sufficiently PITA that I haven't even bothered to try it (my 7940HS only has like 10 TOPS anyway, so not much point even if it worked perfectly).

loufe

13 hours ago

It's always encouraging to see wider hardware platform competition for AI inference and training. Access to affordable and capable hardware for consumers will only benefit (I imagine) from increasing competition.

Decabytes

12 hours ago

Since most people can’t run these LLMs locally, I wonder what a model would look like where we have hyper tuned models for specific purposes, IE a model for code, a model for prose, etc. you have a director model that interprets what downstream model should be used and then it runs that. That way you can run the model locally, without needing beefy GPUs. It’s a trade off of using more disk space vs needing more vram

Philpax

10 hours ago

You're essentially describing Apple Intelligence :-)

https://machinelearning.apple.com/research/introducing-apple... (see Model Adaptation)

fennecbutt

8 hours ago

A rip off of LLMs and loras. Wrapping it in a shiny sounding name for the normies doesn't mean they contributed anything to the space.

Philpax

39 minutes ago

They're not hiding anything; they've very clearly described what they've done and how they've done it.

They've branded their specific architecture and integration, which allows me to easily refer to it as an example.

I understand that it's easy to be cynical about Apple's approach to product development, but it seems unwarranted in this case.

rkharsan64

10 hours ago

If you're using a JetBrains IDE, the AI based autocompletions are powered by super tiny LLMs, each trained on a single language. This allows them to run locally and still product decent results.

For example, the C++ model is really good at writing both OpenGL+GLFW and Raylib.

Havoc

9 hours ago

>IE a model for code

That's already very much a thing. Codestral, Phind, Starcoder etc.

Fine tuning models on whatever you want is quite accessible if you have a good dataset and a 100 bucks of budget

bjt12345

10 hours ago

> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.

I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?