hackernews client

AMD Unveils Its First Small Language Model AMD-135M

311 pointsposted 9 months ago

(community.amd.com)

101 Comments

diggan

9 months ago

> The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.

Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!

Available under this funky GitHub organization it seems: https://github.com/AMD-AIG-AIMA/AMD-LLM

wrs

9 months ago

We (developers and tech managers) really need to hold the line on this terminology. This is a full actual open source LLM. The usual “open inference” model is not.

boulos

9 months ago

I assume by "open inference" you mostly mean "weights available"?

Usually “open source” for an LLM means you get the weights and the inference code, which I’ve started calling “open inference”. It’s certainly good and useful, but it’s not the actual source of the model.

I find people get into silly arguments about the terminology because they’re focused on whether the “source” is “open” and not on what the “source” is actually the source of.

“Weights available” indicates even the weights aren’t “open” in the usual software meaning of the term, as they typically come with restrictive licenses (more restrictive than copyleft or attribution).

throwawaymaths

9 months ago

By "source" do you mean training data?

nickpsecurity

9 months ago

I call them open weights or just freeware like when we got only the EXE’s on Windows.

amelius

9 months ago

Open source is what you would get if an academic institution would release it.

zdragnar

9 months ago

Aren't academic institutions more likely to claim ownership of anything produced than they are to totally open source something?

riedel

9 months ago

In a space where copyrighted material and personal data is an issue, actually academic research can actually use many exceptions (at least under legislation in Europe) . This unfortunately also means that one cannot pass on the data sources under truly licencing beyond research use/ reproducibility.

amelius

9 months ago

I don't think so. Do you have examples?

zdragnar

9 months ago

My perspective mostly comes from the fact that universities will claim parents from research. For something like an AI model, I guess it could go either way.

Personally, I am all in favor of increasing commercialisation of academic research, if and only if the money earned goes first and foremost to reducing tuition for students.

Alas, tuition seems to ever inflate upwards. I remember when my university built a football stadium- all sorts of talking points were floated about, like how the sports programs pay for themselves, it's good for recruiting, we wouldn't get increased tuition, blah blah blah. Well, once construction finished, guess what? Students got a $400 stadium "fee" added onto enrollment each semester. Sure, it wasn't technically part of tuition, but it also wasn't optional.

That alone pretty much guaranteed that I ignored every attempt from the alumni association that came begging for donations after I graduated. They run a well oiled machine, and yet somehow no matter how much money the school had, students kept paying more each year.

So, yeah, I do know many (most?) research schools have commercializing offices dedicated to making money off of their research, and they have every incentive to keep doing so, whether via parents or keeping things secret to commercial partners. They just don't do anything publicly visible with the money.

Edit: sorry for the stream of consciousnesses, I'm running on an hour of sleep and just realized I don't have the energy for a coherent, concise response

ProllyInfamous

9 months ago

I have worked part-time (hardware technician) at two US-based companies, entirely open-source (including firmware), that are still profitable (one for decade+).

My limited understanding (finance side) is that most customers prefer to buy from (even open-source) companies, for several reasons:

1) They don't have time/desire to assemble hundreds of components, prefering drop-in solution

2) Our manufacturing facility has experience / protools, produces products within tighter tolerances

3) Many people understand their own manufacturing limitations and would prefer warranted solutions, without their understood dangers of DIY

I personally dropped out of US grad school because it was the antithesis of open-source licensing.

Disclosure: I am an AMD shareholder, excited about this recent announcement

wmf

9 months ago

You're not wrong, but if you come up with a definition that no one is willing to meet you're just making that definition irrelevant.

ok_dad

9 months ago

Plenty of people publish actual open source software, the definition isn’t the problem, it’s the people who misuse it that are the problem.

wmf

9 months ago

There's a huge difference between software and AI models. We can debate why that happens but it's a fact. Companies are willing to release open weights but virtually no one is willing to create open source models. Shaming and well actuallying has achieved nothing so far.

wrs

9 months ago

And I'm not arguing that they should release open source models. There's no shame in releasing an open-inference model. But I think I'm fair in saying they should use an accurate term for what they do release.

yazzku

9 months ago

There is nothing "source" about the "open source models" that companies typically release. The use of the term "open source" is deliberate marketing BS. If you want to argue there's a difference between software and a model, then don't use software terms that are already well-defined to refer to some property of the model.

https://opensource.org/osd

cassianoleal

9 months ago

It’s a lot worse than marketing BS. It’s deliberate misdirection. Essentially a con.

user

9 months ago

[deleted]

GeekyBear

9 months ago

> Wow, an actual open source language model (first of its kind

Apple research has previously released another example of a model with open training code, data, and weights, but their model was sized for running inference workloads on mobile devices.

However, Apple has a mobile device line of business and AMD has an enterprise AI accelerator line of business, so they are both doing work relevant to their bottom line.

diggan

9 months ago

Thanks, seems you're talking about the OpenELM family of models: https://github.com/apple/corenet/tree/main/projects/openelm

jerrygenser

9 months ago

This would be another example of open source. Not from such a large company but a good reference including code, data, weights, etc.

https://allenai.org/olmo

brianjking

9 months ago

Molmo even more so! The 7b is wild.

kypro

9 months ago

Smart move from AMD. Helps develop an ecosystem around their tech and for their GPUs.

jeff_carr

9 months ago

Has anyone tried it? I mean, I would, but as far as I can tell understand I need 4 boxes with 4 GPU's. Plus an interconnect. I mean, I could put in an order for my homelab but at around $80k per box and maybe $20k for the right switches and some other gear, my wife will probably frown at me ordering a $340,000 rig to try this code that I don't know what to do with it if it works.

Maybe some other heavy hitter out there can explain what all this whatchamacallit newfangled synergy producing matrix algebra does after you have it running?

Shadowmist

9 months ago

> that I don't know what to do with it if it works.

After you get it up and running you can just ask it what to do with it.

Rinzler89

9 months ago

Hope it tells you to buy more AMD HW because that would be so funny.

kypro

9 months ago

Seems kinda obvious this isn't targeted at hobbyists, but more towards SMEs and AI startups that want to build their own language models from scratch or experiment with the tech.

Something like this would help small teams build an initial POC and do some experimentation.

You have similar issues with robotics projects. It's very expensive for a hobbyist because of the hardware costs, but there's large number of small companies who benefit from open source tech to get their projects started.

NitpickLawyer

9 months ago

> Wow, an actual open source language model

I find it funny that the AI field has somehow normalised the goalpost moving from capabilities all the way to definitions about open source. And people seem really tribal about it...

There absolutely are open source LLMs already. Phi3.5 (MIT), various Mistral models (Apache2.0), various Qwen2 models (Apache2.0) and so on. LLamas are not open source, nor are Gemmas. But to say this is "an actual open source model" is weird nitpicking for the sake of nitpicking, IMO.

Requiring the methods and datasets that someone used to create some piece of IP is in no way a requirement for open sourcing said IP. It never has been!

Imagine this analogy:

A dev comes up with a way to generate source code that solves a real problem. This dev uses a secret seed, that only they know. The dev also uses thousands of hours of compute, and an algorithm that they created. At the end of the exercise they release the results on github, as follows:

- here is a project that takes in a piece of text in english, and translates it into french.

- the resulting source code is massive. 10 billions LOC. The lines of code are just if statements, all the way down, with some hardcoded integer values.

- source code licensed under Apache 2.0, written in let's say python.

- users can see the source code

- users can run the source code

- users can modify the source code and re-release the code

Now, would anyone pre LLMs say "this isn't true open source" because it's too complicated? Because no one can reasonably understand the source code? Because it uses hard coded int values? Because it's 10b LOC? Because the dev never shared how they got those values?

Of course not. The resulting code would have been open source because Apache 2.0 is open source.

It's the same with model weights. Just because they're not source code, and just because you don't know how they were created, it does not mean the weights are not open source.

You can see the weights. You can change the weights. You can re-distribute the weights. It's open source. The definition of something being open source does not cover you understanding why the weights are like they are. Nor do they require you having access to the methods of creating those weights. Or datasets. Or whatever the devs had for breakfast.

Nab443

9 months ago

Great, with that definition we can call all binaries opensource !

NitpickLawyer

9 months ago

This is the greatest misconception in this field. Weights are not a binary form! In fact you can't "run" the weights as they are. They only represent some fixed values.

Whenever you use an LLM you "load" the weights, using (usually open source) code and you run inference with that code. The weights are not binary and the analogy to the binary form of distributing software is not valid, IMO.

That is why I used the analogy of a python code with ifs all the way, based on hardcoded values. That is what you are arguing is not open source. The weights are just "hardcoded values".

Open source never had the requirement of the author explaining what, why or how they got a hardcoded value in their shared code. Why it suddenly does for LLMs is what I find funny.

tga_d

9 months ago

By that argument, all bytecode is open source, because it has to be run in some other environment, and you can technically modify it if you want to. Open source is supposed to refer to the human-interpretable elements of the code. E.g., kernel modules that are technically formatted as C code but contain non-human readable firmware as values are still considered "binary blobs" and not part of the free/open source kernels some distros ship.

aloknnikhil

9 months ago

I completely disagree with you. The fundamental problem with your concept of open source is it goes against what open source really is. The ability for you to completely change what a piece of software can do. IMO, even with LLMs, models are "executables" and weights are "configuration". Yes, of course you can tune the weights by changing the values, but that's the most I can do. Can I actually add "features" to the model? Perhaps you "open-sourced" an LLM model trained on the United States Constitution. Can I change the model to then be a specialist in real estate law? Not with weights. I need it to learn case histories to extend its "feature-set". Without data and the mechanism to reproduce the model, how is this "open-source"?

NitpickLawyer

9 months ago

> Can I actually add "features" to the model?

Yes. You can use a number of libraries to add, mix, merge, etc. layers [1]

> Not with weights. I need it to learn case histories to extend its "feature-set".

Again, yes. You can add attention heads, other features, heck you can even add classification if you want [2]. Because you are working with an open architecture! What you think of weights are not binary blobs. That is a common missconception.

[1] - https://github.com/arcee-ai/mergekit

[2] - https://github.com/center-for-humans-and-machines/transforme...

aloknnikhil

9 months ago

At first glance, that just seems like a bunch of libraries linked together to form a binary. That is not open-source. I completely agree with you that there is just not enough clarity out there. For my education, following up with my earlier example, can I remove the layers that have references to all chapters / laws in the constitution except for the ones meant for real-estate? How would I do that with the approaches you mentioned here?

Fundamentally, if I have to "reverse-engineer" something, then it's not open-source.

ErikBjare

9 months ago

You would have to do the same fine-tuning as if you had the training data.

diggan

9 months ago

> that the AI field has somehow normalised the goalpost moving from capabilities all the way to definitions about open source

The problem is that Facebook and others are trying to move the goalpost, while others like me would like the goalpost to remain where it is, namely we call projects "Open source" when the required parts to build it on our own machines, is sufficiently accessible.

As I probably wouldn't be a developer in the first place if it wasn't for FOSS, and I spend literally all day long contributing to others FOSS projects and working on my own, it's kind of scary seeing these large companies trying to change what FOSS means.

I think you're forgetting about the intent and purpose of open source. The goal is that people can run software for whatever purpose they want, and they can modify it for whatever purpose. This is the intent behind the licenses we use when we "create FOSS".

This means, in practice, that the source code has to be accessible somehow, so the compiler I have on my computer, can build a similar binary to the one the project itself offers (if it does). The source code has to be accessible so I can build the project, but also modify it for myself.

Taking this idea that mostly only applied to software before (FOSS) but applying it to ML instead, it's clear to see what we need in order to 1) be able to use it as we want and 2) be able to modify it as we want.

> You can see the weights. You can change the weights. You can re-distribute the weights. It's open source.

Right. If I upload a binary to some website, you can see the binary, you can change the binary and you can re-distribute it. Would you say the binary is open source?

The weights are the binary in ML contexts. It's OK for projects to publish those weights, but it's not OK to suddenly change the definition and meaning of open source because companies want to look like they're doing FOSS, when in reality they're publishing binaries without any ways of building those binaries with your own changes.

Imagine if the Linux kernel was just a big binary blob. Yes, you can change it, re-distribute and what not, but only in a binary-blob shape. You'd be kind of out there if you insist on calling this binary-blob kernel FOSS. I'm sure you'd be able to convince some Facebook engineers about it, seems they're rolling with that idea already, but the rest of us who exist in the FOSS ecosystem? We'd still have the same goalpost in the exact same spot it's been for at least two decades I've been involved.

NitpickLawyer

9 months ago

> Would you say the binary is open source?

Great question. Is the assembly code in a git, with an open source license? Then yes! It's open source!

Think about it this way: just because someone wrote hello world in c and then a compiler translated that into assembly, doesn't invalidate the quality of that assembly code being open source! That's the point. Something is open source or not if the resulting stuff is published under an open source license. Can you see the assembly code? Can you change it? Can you re-publish it? If all of these are yes, then it's open source!

> Imagine if the Linux kernel ...

That is semantics. The linux kernel is published in c because it's easier for people to reason in that abstracted language, but it would not suddenly become "closed source" if it were written in asm, assuming it would still be published under an open source license.

In other words, you having access to the "dataset" would not make the weights any easier to work with. They would still be in a "blob" as you call it.

yencabulator

9 months ago

> Think about it this way: just because someone wrote hello world in c and then a compiler translated that into assembly, doesn't invalidate the quality of that assembly code being open source!

Meanwhile:

> The source code must be the preferred form in which a programmer would modify the program.

https://opensource.org/osd

NitpickLawyer

9 months ago

Then, given the fact that both you and Mistral LLC modify the program in the exact same way, that portion still holds.

People view weights as an intended obfuscation by the party releasing it. It is not! In fact, it is equally as hard for them to "understand" why a certain value at a certain index is what it is, as it is for you! Just ask Anthropic. They are also doing poke this weight, see what pops with their own models.

Again, that is why I used the analogy above. You are arguing that if someone uses a hardcoded value in their code, and won't share how they derived that value, it somehow isn't open source. That, IMO, is wrong.

diggan

9 months ago

> Again, that is why I used the analogy above. You are arguing that if someone uses a hardcoded value in their code, and won't share how they derived that value, it somehow isn't open source. That, IMO, is wrong.

It feels like you deliberately ignore the source part of "open source". If you have X that produces Y, then X is the source, Y is the output. You cannot "open source" Y as Y isn't a source to anything, it's the output from the source. The only part you can "open source" is the source part, which is X in this case.

diggan

9 months ago

Interesting points, regardless of how much I disagree with them, so thank you for sharing your views :)

> Think about it this way: just because someone wrote hello world in c and then a compiler translated that into assembly

I understand your point, since it's technically assembly, you could license that assembly under a FOSS license and now the thing you distributed is "open". I agree you could do this, but you shouldn't use "open source" to describe what you're doing there, unless the actual source for building that asm is open too. The binary might be available, but "open source" is something that applies to source code, not to what we distribute. If your source is C and your output is assembly, but you only try to apply a FOSS license to the output, not the source, it'd be a hard sell to call the source is open and available.

The closest I've come to finding some sort of backing to this view I hold is what OSI echos here:

> What if I do not want to distribute my program in source code form? Or what if I don’t want to distribute it in either source or binary form?

> If you don’t distribute source code, then what you are distributing cannot meaningfully be called “Open Source”. And if you don’t distribute at all, then by definition you’re not distributing source code, so you’re not distributing anything Open Source. [...] Open Source licenses are always applied to the source code — so if you’re not distributing the source, then you’re not distributing the thing to which an Open Source license applies

https://opensource.org/faq#non-distribution

Similarly, I wouldn't call a song I release as "open source" (not that it makes much sense in this case) unless the actual "source" of how it was produced is public under a FOSS license, even if you can technically read the sound data however you want, and modify it by patching the audio file. Instead, some other liberal license is more suitable that allows using/modifying/redistributing the output however you want (Creative Commons is common for those use cases), but not a license that is specifically about source code.

> That is semantics. The linux kernel is published in c because it's easier for people to reason in that abstracted language, but it would not suddenly become "closed source" if it were written in asm, assuming it would still be published under an open source license.

I agree with this too, if suddenly the kernel was written in asm, and it's being distributed as asm, then you can license that asm as "open source" and that'd be OK. What wouldn't be "open source", would be if it's written in C, but that C code isn't licensed "open source", but the authors tries to argue that the compiled asm output is "open source". It's output, not source, so you cannot license the output as "open source" as it's missing that last part, the "source".

> In other words, you having access to the "dataset" would not make the weights any easier to work with. They would still be in a "blob" as you call it.

Precisely. So the requirements end up something like: Can I build this thing from scratch myself, granted I have the required equipment + knowledge + time?

For LLM models, at least the training script + the dataset has to be available without restrictions for that to be possible. If they're not available, or available but under restrictions (usage or otherwise), then it's not open source.

NitpickLawyer

9 months ago

Haha, having lengthy discussions, especially when we disagree, is healthy IMO. That's how we get to experience other viewpoints, and hopefully become better for the effort.

> Can I build this thing from scratch myself

You absolutely can. Everything you need is in the model config (layers, stuff) and there are training scripts all over the net. Now, granted, you will not necessarily get the same results, but then again neither is Mistral or Meta.

> but the authors tries to argue that the compiled asm output is "open source". It's output, not source, so you cannot license the output as "open source" as it's missing that last part, the "source".

Replying here because I can't in the other subthread. I think you are using a misconception on what is source code, and what is a weight. In the LLM world, you already have the source code for inferencing. This would be either pytorch or c code or whatever. You also have the architecture code. You can see what the model looks like, what layers it has, what ops it does to reach a result. That is also open! So you get the source to run inference. You get the source to "load" the model (i.e. the architecture, layers, etc). And you get a bunch of hardcoded values.

What you don't get is the why behind the question "why is this value x and not y". And for the most part, no one knows.

> If they're not available, or available but under restrictions (usage or otherwise), then it's not open source.

Let's take another (famous) example. Quake is famous for having a hardcoded value somewhere in the source code, that speeds up some geometry computations. Now, you can change that value, but things will be messed up in the engine. Collisions will happen weirdly, things will look bad. Now, is quake any less of "open source" if you or I don't understand why the original coder chose that value? Of course not! Well, now just multiply that with 1B hardcoded values. It's the exact same thing. You could change any of the values, but the game would look wonky as you do so. But, at the end of the day, it would not be any less open source.

I guess what I'm trying to say is that weights are not binary blobs. Weights are not an obfuscation attempt. Weights are distributed exactly how they are intended to be used, and how they are being used by the creators as well. You can change the architecture of a model (see above for details). You can add layers, you can remove layers. You can perform "abliterations", or you can do fine-tuning. Everything is exactly done as the "creators" intended. The only thing you don't have is "how they got those exact same numbers". But you don't need that. And it's funny that somehow for LLMs that's a bridge too far. It never used to be for any other project.

bubaumba

9 months ago

No, it's not open source till someone can actually reproduce it. That's the hardest part. For now it's open weights open dataset. Which is not the same.

diggan

9 months ago

That's... Not how open source works? The "binary" (model weights) is open source and the "software" (training scripts + data used for training) is open source, this release is a real open source release. Independent reproduction is not needed to call something open source.

Can't believe it's the second time I end up with the very same argument about what open source is today on HN.

dboreham

9 months ago

But wouldn't failure to achieve independent reproduction falsify the open claim?

Similar to you publish the source for Oracle (the database), but nobody can build a binary from it because it needs magic compliers or test suites that aren't open source?

Heck when the browser was open-sourced, there was an explicit test where the source was given to some dude who didn't work for Netscape to verify that he could actually make a working binary. It's a scene in the movie "Code Rush".

wrs

9 months ago

The interesting part of the product we’re taking about (that is, the equivalent of the executable binary of an ordinary software product) is the weights. The “source” is not sufficient to “recompile” the product (i.e., recreate the weights). Therefore, while the source you got is open, you didn’t get all the source to the thing that was supposedly “open source”.

It’s like if I said I open-sourced the Matrix trilogy and only gave you the DVD image and the source to the DVD decoder.

(Edit: Sorry, I replied to the wrong comment. I’m talking primarily about the typical sort of release we see, not this one which is a lot closer to actually open.)

littlestymaar

9 months ago

> The “source” is not sufficient to “recompile” the product (i.e., recreate the weights). Therefore, while the source you got is open, you didn’t get all the source to the thing that was supposedly “open source”.

What's missing?

wrs

9 months ago

Well, I’m not experienced in training full-sized LLMs, and it’s conceivable that in this particular case the training process is simple enough that nothing is missing. That would be a rarity, though. But see my edit above — I’m not actually reacting to this release when I say that.

littlestymaar

9 months ago

OK, so you just like to be a contrarian…

bubaumba

9 months ago

You are missing key points here. "reproduce" means produce the same. Not just train similar model.

I can simplify the task, can you convincingly explain how the same model can be produced from this dataset? We can start simple, how you can possibly get the same weights after the first single iteration? I.e. the same as original model got. Pay attention to randomness, data selection, initial model state.

Ok, if you can't do that. Can you explain in believable way how to prove that given model was trained on give dataset? I'm not asking you for actually doing all these things, that could be expensive, only to explain how it can be done.

Strict 'open source' includes not only open weights, open data. It also includes the word "reproducible". It's not "reproduced", only "reproducible". And even this is not the case here.

Sayrus

9 months ago

Reproducible builds are not a requirement for open source software, why is it one for open source models?

wrs

9 months ago

I would say that functionally reproducible builds are sort of inherent in the concept of “source”. When builds are “not reproducible” that typically just means they’re not bit-for-bit identical, not that they don’t produce the same output for a given input.

prophesi

9 months ago

Once neural networks enter the scene, I don't think giving the same output for a given input is possible in the field currently. I believe this is as open as language models can be, and what people mean when they say it's a "fully open source" model.

Zamiel_Snawley

9 months ago

If they provide the training code, and data set, how is that not enough to reproduce functionally equivalent weights? I don’t have any experience in the AI field, what else would they need to provide/define?

As others have mentioned, reproducible builds can be quite difficult to achieve even with regular software.

Compiler versions, build system versions, system library versions, time stamps, file paths, and more often contribute to getting non-identical yet functionally equivalent binaries, but the software is still open source.

worewood

9 months ago

How often do people expect to compile open-source code and get _exactly_ the same binary as the distributed one? I've seen this kind of restriction only on decompilation projects e.g. the SM64 decompilation -- where they deliberately compare the hashes of original vs. compiled binaries, as a way to verify the decompilation is correct.

It's an unreasonable request with ordinary code, even more with ML where very few ones have access to the necessary hardware, and where in practice, it is not deterministic.

bubaumba

9 months ago

> How often do people expect to compile open-source code and get _exactly_ the same binary

_Always_, with the right options. And that's the key point. If distributed code is different it means it may be infected or altered in other way. In other words it cannot be trusted.

The same with models, if they are not reproducible or verifiable they cannot be trusted. Trust is the main feature of open source. Calling black box with attached data 'open source', even 'the first' is a bit of a stretch. It's not reproducible and not verifiable. And it's definitely not the first model with open data.

To be correct you should add 'untrusted' if you want to call this thing 'open source'. Like with Meta's models who knows what it holds.

PS: finally I'm negative, fanboys don't like it ;-)

e12e

9 months ago

I expect that if I compile your 3d renderer, and feed it the same scene file you did - I get the same image?

TylerE

9 months ago

Why would you expect that? 3D renderers are not generally deterministic. Many will incorporate, for instance, noise algorithms. They will frequently not produce byte-identical renders on the same hardware using the same binary.

e12e

9 months ago

Same recognizable image? Like if you look at the povray benchmark image on Linux and Windows?

For an llm that should (ideally) translate to similar answers?

bavell

9 months ago

I think you are erroneously conflating open source with deterministic builds.

Yes, there is a random element when "producing the binary" but that doesn't mean it isn't open source.

Jabrov

9 months ago

What’s the difference?

frontalier

9 months ago

the goal posts moved?!

avaldez_

9 months ago

Reproducibility? I mean what's the point of an open technology nobody knows if it works or not.

n_ary

9 months ago

Now this here is the beginning on real innovation of AI. With AMD coming in(albeit late and slowly), meta with LLama improving, we will soon see some real adaptation and development in next few thousand days. At this moment, I see OAI as the yahoo of the pre-Google era.

imjonse

9 months ago

"next few thousand days"

can we stick to years as a unit of measure and not spread Sam Altman's phrase :)

washadjeffmad

9 months ago

Twenty two thousand days

It's not a lot, it's all we got

Twenty two thousand days

- Sam Altman?

raffraffraff

9 months ago

I was just listening to that song in the last hour

highfrequency

9 months ago

Looks like they are using sixteen $13k GPUs [1] (around $210k hardware) for 6 days of training.

Anyone know the recommended cloud provider and equivalent rental price?

[1] https://www.wiredzone.com/shop/product/10025451-supermicro-g...

layoric

9 months ago

MI250s definitely aren’t a common card to rent so only can find Runpod at $2.10 per hour each. This results in a training cost of $4838 + fine tuning of $3225. However this doesn’t include the 11TB of storage or time taken to get the setup actually running the tasks. So likely you wouldn’t see much change from $10k usd if any.

- https://www.runpod.io/gpu/mi250

lhl

9 months ago

Runpod.io rents the next-gen MI300X's for $4/hr, although since they also rent H100's for $3/hr (that are easier to work with/faster for training) it might be more of a novelty.

highfrequency

9 months ago

I thought the whole selling point of AMD GPUs was that they were a lot cheaper than Nvidia GPUs?

user

9 months ago

[deleted]

dagmx

9 months ago

Cheaper for the cloud company. But that doesn’t always translate to cheaper for the end user. Maybe they cost more to run or maybe there’s fewer of them so they’re more expensive to book?

knotimpressed

9 months ago

At least a couple years ago, a big advantage of Nvidea cards was how much cheaper they were to run power wise-often the dies that made it into cloud level cards would be binned consumer dies.

Not sure if that’s still the case, but I’d say it’s plausible.

lostmsu

9 months ago

Impossible. Power costs for H100-like cards are dwarfed by the cost of the cards themselves. H100 at full load will consume ~$3500 (rough estimate) of power in 5 years at $0.12/kWh.

yencabulator

9 months ago

Data centers are more constrained by availability of power, and the matching cooling, than the actual bulk cost of it.

For example, I've been in situations where we had to deploy fewer hard drives per server unit than we otherwise would have, just because we knew we couldn't power & cool the racks if we fully stocked them.

lostmsu

9 months ago

How much do you think it would cost to install cooling for 5kW power output (1 8xH100 node)?

yencabulator

9 months ago

It's more that a room/wing/building/whatever is built with a certain capacity, while power and heat density keeps increasing. Most buildings can't "just add cooling for 5kW". Newer data centers should already be built with that lesson in mind, for example with a lot of extra air moving capacity etc planned in, but that's been the struggle for a while now.

wmf

9 months ago

Hot Aisle seems to the (only?) place to rent AMD. (Ryan, please don't spam this thread. It's not a good look.)

benterix

9 months ago

I'm happy to see a truly open source model.

Actually, AMD has excellent reasons to make this kind of development and I hope they continue.

luyu_wu

9 months ago

The section on speculative execution is interesting. "This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements."

Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful.

Very interesting though! I'll be playing around with this on the weekend!

lhl

9 months ago

Orders of magnitude seems a bit ambitious. The implementation from the DeepMind paper achieved a 2-2.5X https://arxiv.org/pdf/2302.01318 and most of the tests I've seen [1][2] have been similar, but there are different variations (Medusa, Ouroboros, etc) that can do better/be combined. Recently Together.ai published SpecExec, a SD variant which did claim to get a 10-18X speedups: https://www.together.ai/blog/specexec

[1] https://www.reddit.com/r/LocalLLaMA/comments/17h4rqz/specula...

[2] https://arxiv.org/pdf/2402.01528v3

lhl

9 months ago

BTW, I got a chance to read through the model card and there's a section that shows their SD gains: https://huggingface.co/amd/AMD-Llama-135m#speculative-decodi...

- 1.75x-2.80x on MI250

- 2.83x-2.98x on NPU

- 3.57x-3.88x on CPU

Note they were testing on AMD-Llama-135m-code as draft model for CodeLlama-7b, both of which do similarly badly on Humaneval Pass@1 (~30%), so it's likely if they were using a similarly trained 135m to SD for say, Qwen2.5-Coder (88.4% on HumanEval), the perf gains would probably be much worse.

Decabytes

9 months ago

Since most people can’t run these LLMs locally, I wonder what a model would look like where we have hyper tuned models for specific purposes, IE a model for code, a model for prose, etc. you have a director model that interprets what downstream model should be used and then it runs that. That way you can run the model locally, without needing beefy GPUs. It’s a trade off of using more disk space vs needing more vram

wmf

9 months ago

The whole point of this model is that it's so tiny that even a weak RPi could run it. Apple has also done some interesting work with a common <4B base model that is customized with different LoRAs for different purposes.

kristianp

9 months ago

About 3b if you're referring to this: https://machinelearning.apple.com/research/introducing-apple...

rkharsan64

9 months ago

If you're using a JetBrains IDE, the AI based autocompletions are powered by super tiny LLMs, each trained on a single language. This allows them to run locally and still product decent results.

For example, the C++ model is really good at writing both OpenGL+GLFW and Raylib.

Philpax

9 months ago

You're essentially describing Apple Intelligence :-)

https://machinelearning.apple.com/research/introducing-apple... (see Model Adaptation)

fennecbutt

9 months ago

A rip off of LLMs and loras. Wrapping it in a shiny sounding name for the normies doesn't mean they contributed anything to the space.

Philpax

9 months ago

They're not hiding anything; they've very clearly described what they've done and how they've done it.

They've branded their specific architecture and integration, which allows me to easily refer to it as an example.

I understand that it's easy to be cynical about Apple's approach to product development, but it seems unwarranted in this case.

Havoc

9 months ago

>IE a model for code

That's already very much a thing. Codestral, Phind, Starcoder etc.

Fine tuning models on whatever you want is quite accessible if you have a good dataset and a 100 bucks of budget

craftkiller

9 months ago

I see multiple mentions of NPU on this page, but its still not clear to me: is this something that can finally use the NPU on my processor?

lhl

9 months ago

There's actually seems to be a bunch of stuff now:

* https://github.com/amd/RyzenAI-SW - has a list of demos and how to use it directly (including apparently w/ PyTorch and LLMs)

* https://github.com/huggingface/optimum-amd - can use RyzenAI to use the NPU for HF transformers

There's now a Linux driver even https://github.com/amd/xdna-driver although it looks like a sufficiently PITA that I haven't even bothered to try it (my 7940HS only has like 10 TOPS anyway, so not much point even if it worked perfectly).

loufe

9 months ago

It's always encouraging to see wider hardware platform competition for AI inference and training. Access to affordable and capable hardware for consumers will only benefit (I imagine) from increasing competition.

bjt12345

9 months ago

> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.

I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?

user

9 months ago

[deleted]

user

9 months ago

[deleted]