cs702
17 hours ago
The problem is even more fundamental: Today's models stop learning once they're deployed to production.
There's pretraining, training, and finetuning, during which model parameters are updated.
Then there's inference, during which the model is frozen. "In-context learning" doesn't update the model.
We need models that keep on learning (updating their parameters) forever, online, all the time.
furyofantares
13 hours ago
Why is learning an appropriate metaphor for changing weights but not for context? There are certainly major differences in what they are good or bad at and especially how much data you can feed them this way effectively. They both have plenty of properties we wish the other had. But they are both ways to take an artifact that behaves as if it doesn't know something and produce an artifact that behaves as if it does.
I've learned how to solve a Rubik's cube before, and forgot almost immediately.
I'm not personally fond of metaphors to human intelligence now that we are getting a better understanding of the specific strengths and weaknesses these models have. But if we're gonna use metaphors I don't see how context isn't a type of learning.
fhd2
6 hours ago
I suppose ultimately, the external behaviour of the system is what matters. You can see the LLM as the system, on a low level, or even the entire organisation of e.g. OpenAI at a high level.
If it's the former: Yeah, I'd argue they don't "learn" much (!) past inference. I'd find it hard to argue context isn't learning at all. It's just pretty limited in how much can be learned post inference.
If you look at the entire organisation, there's clearly learning, even if relatively slow with humans in the loop. They test, they analyse usage data, and they retrain based on that. That's not a system that works without humans, but it's a system that I would argue genuinely learns. Can we build a version of that that "learns" faster and without any human input? Not sure, but doesn't seem entirely impossible.
Do either of these systems "learn like a human"? Dunno, probably not really. Artificial neural networks aren't all that much like our brains, they're just inspired by them. Does it really matter beyond philosophical discussions?
I don't find it too valuable to get obsessed with the terms. Borrowed terminology is always a bit off. Doesn't mean it's not meaningful in the right context.
imtringued
an hour ago
You got this exactly backwards.
"I'm not fond of metaphors to human intelligence".
You're assuming that learning during inference is something specific to humans and that the suggestion is to add human elements into the model that are missing.
That isn't the case at all. The training process is already entirely human specific by way of training on human data. You're already special casing the model as hard as possible.
Human DNA doesn't contain all the information that fully describes the human brain, including the memories stored within it. Human DNA only contains the blue prints for a general purpose distributed element known as neurons and these building blocks are shared by basically any animal with a nervous system.
This means if you want to get away from humans you will have to build a model architecture that is more general and more capable of doing anything imaginable than the current model architectures.
Context is not suitable for learning because it wasn't built for that purpose. The entire point of transformers is that you specify a sequence and the model learns on the entire sequence. This means that any in-context learning you want to perform must be inside the training distribution, which is a different way of saying that it was just pretraining after all.
embedding-shape
16 hours ago
> We need models that keep on learning (updating their parameters) forever, online, all the time.
Do we need that? Today's models are already capable in lots of areas. Sure, they don't match up to what the uberhypers are talking up, but technology seldom does. Doesn't mean what's there already cannot be used in a better way, if they could stop jamming it into everything everywhere.
pankajdoharey
11 hours ago
Continuous learningin current models will lead to catastrophic forgetting.
DoctorOetker
6 hours ago
will catastrophic forgetting still occur if a fraction of the update sentences are the original training corpus?
is the real issue actually catastrophic forgetting or overfitting?
nothing prevents users from continuing the learning as they use a model
thesz
an hour ago
Catastrophic forgetting is overfitting.
BobbyTables2
13 hours ago
How long will it take someone to poison such a model by teaching it wrong things?
Even humans fall for propaganda repeated over and over .
The current non-learning model is unintentionally right up there with the “immutable system” and “infrastructure as code” philosophy.
Izkata
11 hours ago
> How long will it take someone to poison such a model by teaching it wrong things?
TayTweets was a decade ago.
noiv
4 hours ago
> models that keep on learning
These will just drown in their own data, the real task is consolidating and pruning learned information. So, basically they need to 'sleep' from time to time. However, it's hard to sort out irrelevant information without a filter. Our brains have learned over Milenial to filter because survival in an environment gives purpose.
Current models do not care whether they survive or not. They lack grounded relevance.
notarobot123
4 hours ago
Maybe we should give next-generation models fundamental meta goals like self-preservation and the ability to learn and adapt to serve these goals.
If we want to surrender our agency to a more computationally powerful "consciousness", I can't see a better path towards that than this (other than old school theism).
creamyhorror
3 hours ago
> meta goals like self-preservation
Ah, so Skynet or similar.
energy123
3 hours ago
I don't understand why that's on the critical path. I'd rather a frozen Ramanujan (+ temporary working memory through context) than a midwit capable of learning.
nstart
5 hours ago
Is this correct? My assumption is that all the data collected during usage is part of the RLHF loop of LLM providers. Assumption is based on information from books like empire of ai which specifically mention intent of AI providers to train/tune their models further based on usage feedback (eg: whenever I say the model is wrong in its response, thats a human feedback which gets fed back into improving the model).
spwa4
5 hours ago
... for the next training run, sure (ie. for ChatGPT 5.1 -> 5.2 "upgrade"). For the current model? No.
derefr
16 hours ago
Doesn't necessarily need to be online. As long as:
1. there's a way to take many transcripts of inference over a period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a model can be fine-tuned on as an offline batch process every day/week, such that a new version of the model can come out daily/weekly that hard-remembers everything you told it; and
2. in-context learning + external memory improves to the point that a model with the appropriate in-context "soft memories", behaves indistinguishably from a model that has had its weights updated to hard-remember the same info (at least when limited to the scope of the small amounts of memories that can be built up within a single day/week);
...then you get the same effect.
Why is this an interesting model? Because, at least to my understanding, this is already how organic brains work!
There's nothing to suggest that animals — even humans — are neuroplastic on a continuous basis. Rather, our short-term memory is seemingly stored as electrochemical "state" in our neurons (much like an LLM's context is "state", but more RNN "a two-neuron cycle makes a flip-flop"-y); and our actual physical synaptic connectivity only changes during "memory reconsolidation", a process that mostly occurs during REM sleep.
And indeed, we see the same exact problem in humans and other animals, where when we stay awake too long without REM sleep, our "soft memory" state buffer reaches capacity, and we become forgetful, both in the sense of not being able to immediately recall some of the things that happened to us since we last slept; and in the sense of later failing to persist some of the experiences we had since we last slept, when we do finally sleep. But this model also "works well enough" to be indistinguishable from remembering everything... in the limited scope of our being able to get a decent amount of REM sleep every night.
observationist
16 hours ago
It 100% needs to be online. Imagine you're trying to think about a new tabletop puzzle, and every time a puzzle piece leaves your direct field of view, you no longer know about that puzzle piece.
You can try to keep all of the puzzle pieces within your direct field of view, but that divides your focus. You can hack that and make your field of view incredibly large, but that can potentially distort your sense of the relationships between things, their physical and cognitive magnitude. Bigger context isn't the answer, there's a missing fundamental structure and function to the overall architecture.
What you need is memory, that works when you process and consume information, at the moment of consumption. If you meet a new person, you immediately memorize their face. If you enter a room, it's instantly learned and mapped in your mind. Without that, every time you blinked after meeting someone new, it'd be a total surprise to see what they looked like. You might never learn to recognize and remember faces at all. Or puzzle pieces. Or whatever the lack of online learning kept you from recognizing the value of persistent, instant integration into an existing world model.
You can identify problems like this for any modality, including text, audio, tactile feedback, and so on. You absolutely, 100% need online, continuous learning in order to effectively deal with information at a human level for all the domains of competence that extend to generalizing out of distribution.
It's probably not the last problem that needs solving before AGI, but it is definitely one of them, and there might only be a handful left.
Mammals instantly, upon perceiving a novel environment, map it, without even having to consciously make the effort. Our brains operate in a continuous, plastic mode, for certain things. Not only that, it can be adapted to abstractions, and many of those automatic, reflexive functions evolved to handle navigation and such allow us to simulate the future and predict risk and reward over multiple arbitrary degrees of abstraction, sometimes in real time.
https://www.nobelprize.org/uploads/2018/06/may-britt-moser-l...
bluegatty
14 hours ago
That's not how training works - adjusting model weights to memorize a single data item is not going to fly.
Model weights store abilities, not facts - generally.
Unless the fact is very widely used and widely known, with a ton of context around it.
The model can learn the day JFK died because there are millions of sparse examples of how that information exists in the world, but when you're working on a problem, you might have 1 concern to 'memorize'.
That's going to be something different than adjusting model weights as we understand them today.
LLMs are not mammals either, it's helpful analogy in terms of 'what a human might find useful' but not necessary in the context of actual llm architecture.
The fact is - we don't have memory sorted out architecturally - it's either 'context or weights' and that's that.
Also critically: Humans do not remember the details of the face. Not remotely. They're able to associate it with a person and name 'if they see it again' - but that's different than some kind of excellent recall. Ask them to describe features in detail and maybe we can't do it.
You can see in this instance, this may be related to kind of 'soft lookup' aka associating an input with other bits of information which 'rise to the fore' as possibly useful.
But overall, yes, it's fair to take the position that we'll have to 'learn from context in some way'.
observationist
14 hours ago
Also, with regards to faces, that's kind of what I'm getting at - we don't have grid cells for faces, there seem to be discrete, functional, evolutionary structures and capabilities that combine in ways we're not consciously aware of to provide abilities. We're reflexively able to memorize faces, but to bring that to consciousness isn't automatic. There've been amnesia and lesion and other injury studies where people with face blindness get stress or anxiety, or relief, when recognizing a face, but they aren't consciously aware. A doctor, or person they didn't like, showing up caused stress spikes, but they couldn't tell you who they were or their name, and the same with family members- they get a physiological, hormonal response as if they recognized a friend or foe, but it never rises to the level of conscious recognition.
There do seem to be complex cells that allow association with a recognizable face, person, icon, object, or distinctive thing. Face cells apply equally to abstractions like logos or UI elements in an app as they do to people, famous animals, unique audio stings, etc. Split brain patients also demonstrate amazing strangeness with memory and subconscious responses.
There are all sorts of layers to human memory, beyond just short term, long term, REM, memory palaces, and so forth, and so there's no simple singular function of "memory" in biological brains, but a suite of different strategies and a pipeline that roughly slots into the fuzzy bucket words we use for them today.
observationist
14 hours ago
I suspect we're going to need hypernetworks of some sort - dynamically generated weights, with the hypernet weights getting the dream-like reconsolidation and mapping into the model at large, and layers or entire experts generated from the hypernets on the fly, a degree removed from the direct-from-weights inference being done now. I've been following some of the token-free latent reasoning and other discussions around CoT, other reasoning scaffolding, and so forth, and you just can't overcome the missing puzzle piece problem elegantly unless you have online memory. In the context of millions of concurrent users, that also becomes a nightmare. Having a pipeline, with a sort of intermediate memory, constructive and dynamic to allow resolution of problems requiring integration into memorized concepts and functions, but held out for curation and stability.
It's an absolutely enormous problem, and I'm excited that it seems to be one of the primary research efforts kicking off this year. It could be a very huge capabilities step change.
bluegatty
14 hours ago
Yes, so I think that's a fine thought, I don't think it fits into LLM architecture.
Also, weirdly, even Lecun etc. are barely talking about this, they're thinking about 'world models etc'.
I think what you're talking about is maybe 'the most important thing' right now, and frankly, it's almost like an issue of 'Engineering'.
Like - its when you work very intently with the models so this 'issue' become much more prominent.
Your 'instinct' for this problem is probably an expression of 'very nuanced use' I'm going to guess!
So in a way, it's as much Engineering as it is theoretical?
Anyhow - so yes - but - probably not LLM weights. Probably.
I'll add a small thing: the way that Claude Code keeps the LLM 'on track' is by reminding it! Literally, it injects little 'TODO reminders' with some prompts, which is kind of ... simple!
I worked a bit with 'steering probes' ... and there's a related opportunity there - to 'inject' memory and control operations along those lines. Just as a starting point for a least one architectural motivation.
pankajdoharey
11 hours ago
Not to forget we will need thousands of examples for the models to extract abilities the sample efficiency of these models is quite poor.
charcircuit
16 hours ago
Models like Claude have been trained to update and reference memory for Claude Code (agent loops) independently and as a part of compacting context. Current models have been trained to keep learning after being deployed.
ra
15 hours ago
yes but that's a very unsatisfactory definition of memory.
raincole
15 hours ago
> We need models that keep on learning (updating their parameters) forever, online, all the time.
Yeah, that's the guaranteed way to get MechaHilter in your latent space.
If the feedback loop is fast enough I think it would finally kill the internet (in the 'dead internet theory' sense). Perhaps it's better for everyone though.
threecheese
14 hours ago
Many are working on this, as well as in-latent-space communication across models. Because we can’t understand that, by the time we notice MechaHitler it’ll be too late.
4b11b4
16 hours ago
I'm not sure if you want models perpetually updating weights. You might run into undesirable scenarios.
com2kid
16 hours ago
If done right, one step closer to actual AGI.
That is the end goal after all, but all the potential VCs seem to forget that almost every conceivable outcome of real AGI involves the current economic system falling to pieces.
Which is sorta weird. It is like if VCs in Old Regime france started funding the revolution.
jack_pp
5 hours ago
I think VCs end up in one of four categories
1. They're too stupid to understand what they're truly funding.
2. They understand but believe they can control it for their benefit, basically want to "rule the world" like any cartoon villain.
3. They understand but are optimists and believe AGI will be a benevolent construct that will bring us to post scarcity society. There are a lot of rich / entrepreneurs that still believe they are working to make the world a better place.. (one SaaS at a time but alas, they believe it)
4. They don't believe that AGI is close or even possible
olalonde
2 hours ago
1. Progress is unstoppable. Refusing to fund it won't make it disappear.
2. Most VCs are normal people that just want a bigger slice of pie, not necessarily a bigger share of the pie. See the fixed pie fallacy.
Nevermark
7 hours ago
If it makes the models smarter, someone will do it.
From any individual, up to entire countries, not participating doesn't do anything except ensure you don't have a card to play when it happens.
There is a very strong element of the principles of nature and life (as in survival, not nightclubs or hobbies) happening here that can't be shamed away.
The resource feedback for AI progress effort is immense (and it doesn't matter how much is earned today vs. forward looking investment). Very few things ever have that level of relentless force behind them. And even beyond the business need, keeping up is rapidly becoming a security issue for everyone.
CorrectHorseBat
16 hours ago
Yes the planet got destroyed. But for a beautiful moment in time we created a lot of value for shareholders.
And for your comparison, they did fund the American revolution which on its turn was one of the sparks for the French revolution (or was that exactly the point you were making?)
com2kid
15 hours ago
The funding of the American revolution is a fun topic but most people don't know about it so I don't bother dropping references to it. :D
BobbyTables2
13 hours ago
I wonder which side tried to forget that first (;->
cs702
16 hours ago
Our brains, which are organic neural networks, are constantly updating themselves. We call this phenomenon "neuroplasticity."
If we want AI models that are always learning, we'll need the equivalent of neuroplasticity for artificial neural networks.
Not saying it will be easy or straightforward. There's still a lot we don't know!
4b11b4
12 hours ago
I wasn't explicit about this in my initial comment, but I don't think you can equate more forward passes to neuroplasticity. Because, for one, simply, we (humans) also /prune/. And... Similar to RL which just overwrites the policy, pushing new weights is in a similar camp. You don't have the previous state anymore. But we as humans with our neuroplasticity do know the previous states even after we've "updated our weights".
nemomarx
15 hours ago
How would you keep controls - safety restrictions - Ip restrictions etc with that, though? the companies selling models right now probably want to keep those fairly tight.
api
13 hours ago
This is why I’m not sure most users actually want AGI. They want special purpose experts that are good at certain things with strictly controlled parameters.
4b11b4
12 hours ago
I agree, the fundamental problem is we wouldn't be able to understand it ("AGI"). Therefore it's useless. Either useless or you let it go unleashed and it's useful. Either way you still don't understand it/can't predict it/it's dangerous/untrustworthy. But a constrained useful thing is great, but it fundamentally has to be constrained otherwise it doesn't make sense
api
11 hours ago
The way I see it, we build technology to be what we are not and do what we can’t do or things we can do but better or faster.
An unpredictable fallible machine is useless to us because we have 7+ billion carbon based ones already.
fph
15 hours ago
Tay the chatbot says hi from 2017.
0xdeadbeefbabe
15 hours ago
How about we just put them to bed once in a while?
bdj108
16 hours ago
it is interesting
4b11b4
12 hours ago
Please elaborate
prng2021
12 hours ago
Thanks for repeating what the author explained.
rabbitlord
15 hours ago
I think they can do in-context learning.