hackernews client

How Long Contexts Fail

8 pointsposted a day ago

5 Comments

fennecbutt

a day ago

I find this topic very interesting because it's something I've run into and mitigated ever since gpt3 was available.

Plenty of long, loooong and complex role plays, world building and tests to see if I could integrate dozens of different local models into a game project or similar.

All of the same issues there apply here for "agents" as well.

Very quickly learn that even current models are like distracted puppies. Larger models seem to be able to brute force their way through some of these problems but I wouldn't call that sustainable.

mberlove

7 hours ago

Sometimes it seems you can "remind" the more established models, and this will bring the context back into focus (just from personal experience) but why that would work, I can only guess.

What methods have you found to brute-force through the problem?

There seems to be a need to be some kind of hierarchy to contextualization brought to LLMs and not a flat history stuffed into the context. For comparison, human memory is not just a compressing of linear history of which we rewind the tape when needed. Its more like a Russian doll of bounded contexts which we break out of, when our inner context is no longer sufficient to solve the problem.

fennecbutt

a day ago

I mean that's pretty much self attention mechanism right?

It's just beyond playing with more heads, specialised heads, kv caching etc it doesn't seem like anybody's figured out the next step here yet.

Attention is already pretty atrocious perf even with caching so additional context metadata would have to be implemented carefully.

aitchnyu

14 hours ago

Are there published numbers or benchmarks that tells model X is brain fried at Y tokens?

How Long Contexts Fail

5 Comments

fennecbutt

mberlove

rorylaitila

fennecbutt

aitchnyu