hackernews client

Extending the context length to 1M tokens

116 pointsposted a year ago

(qwenlm.github.io)

110 Comments

aliljet

a year ago

This is fantastic news. I've been using Qwen2.5-Coder-32B-Instruct with Ollama locally and it's honestly such a breathe of fresh air. I wonder if any of you have had a moment to try this newer context length locally?

BTW, I fail to effectively run this on my 2080 ti, I've just loaded up the machine with classic RAM. It's not going to win any races, but as they say, it's not the speed that matter, it's the quality of the effort.

lukev

a year ago

I ran a couple needle-in-a-haystack type queries with just a 32k context length, and was very much not impressed. It often failed to find facts buried in the middle of the prompt, that were stated almost identically to the question being asked.

It's cool that these models are getting such long contexts, but performance definitely degrades the longer the context gets and I haven't seen this characterized or quantified very well anywhere.

zackangelo

a year ago

Would you care to share your prompts?

They posted a haystack benchmark in the blog post that seems too good to be true.

lukev

a year ago

I wasn't scientific about it, unfortunately. My searches were natural language, not token-based, though.

busssard

a year ago

yeah, when i saw that they have 100% coverage with 1M token, i thought this must be a placeholder image, for when the actual results come in.

Because there is no variation, nothing.

ipsum2

a year ago

The long context model has not been open sourced.

notjulianjaynes

a year ago

Hi, are you able to use Qwen's 128k context length with Ollama? Using AnythingLLM + Ollamma and a GGUF version I kept getting an error message with prompts longer than 32,000 tokens. (summarizing long transcripts)

syntaxing

a year ago

The famous Daniel Chen (same person that made Unsloth and fixed Gemini/LLaMa bugs) mentioned something about this on reddit and offered a fix. https://www.reddit.com/r/LocalLLaMA/comments/1gpw8ls/bug_fix...

zargon

a year ago

After reading a lot of that thread, my understanding is that yarn scaling is disabled intentionally by default in the GGUFs, because it would degrade outputs for contexts that do fit in 32k. So the only change is enabling yarn scaling at 4x, which is just a configuration setting. GGUF has these configuration settings embedded in the file format for ease of use. But you should be able to override them without downloading an entire duplicate set of weights (12 to 35 GB!). (It looks like in llama.cpp the override-kv option can be used for this, but I haven't tried it yet.)

syntaxing

a year ago

Oh super interesting, I didn’t know you can override this with a flag on llama.cpp.

notjulianjaynes

a year ago

Yeah unfortunately that's the exact model I'm using (Q5 version. What I've been doing is first loading the transcript into the vector database, and then giving it a prompt thats like "summarize the transcript below: <full text of transcript>". This works surprisingly well except for one transcript I had which was of a 3 hour meeting that was per an online calculator about 38,000 tokens. Cutting the text up into 3 parts and pretending each was a seperate meeting* lead to a bunch of hallucinations for some reason.

*In theory this shouldn't matter much for my purpose of summarizing city council meetings that follow a predictable format.

lr1970

a year ago

> We have extended the model’s context length from 128k to 1M, which is approximately 1 million English words

Actually English language tokenizers map on average 3 words into 4 tokens. Hence 1M tokens is about 750K English words not a million as claimed.

swyx

a year ago

good, its been hours since i saw a "well actually" comment on HN

user

a year ago

[deleted]

lostmsu

a year ago

Is this model downloadable?

gkaye

a year ago

They are not clear about this (which is annoying), but it seems it will not be downloadable. No weights have been released so far, and nothing in this post mentions plans to do so going forward.

swazzy

a year ago

Note unexpected three body problem spoilers in this page

zargon

a year ago

Those summaries are pretty lousy and also have hallucinations in them.

johndough

a year ago

I agree. Below are a few errors. I have also asked ChatGPT to check the summaries and it found all the errors (and even made up a few more which weren't actual errors, but just not expressed in perfect clarity.)

Spoilers ahead!

First novel: The Trisolarans did not contact earth first. It was the other way round.

Second novel: Calling the conflict between humans and Trisolarans a "complex strategic game" is a bit of a stretch. Also, the "water drops" do not disrupt ecosystems. I am not sure whether "face-bearers" is an accurate translation. I've only read the English version.

Third novel: Luo Yi does not hold the key to the survival of the Trisolarans and there were no "micro-black holes" racing towards earth. Trisolarans were also not shown colonizing other worlds.

I am also not sure whether Luo Ji faced his "personal struggle and psychological turmoil" in this novel or in an earlier novel. He certainly was most certain of his role at the end. Even the Trisolarians judged him at over 92 % deterrent rate.

bcoates

a year ago

Yeah describing Luo Ji as having "struggles with the ethical implications of his mission" is the biggest whopper.

He's like God's perfect sociopath. He wobbles between total indifference to his mission and interplanetary murder-suicide, and the only things that seem to really get to him are a stomachache and being ghosted by his wife.

johndough

a year ago

And this example does not even illustrate the long context understanding well, since smaller Qwen2.5 models can already recall parts of the Three Body Problem trilogy without pasting the three books into the context window.

gs17

a year ago

And multiple summaries of each book (in multiple languages) are almost definitely in the training set. I'm more confused how it made such inaccurate, poorly structured summaries given that and the original text.

Although, I just tried with normal Qwen 2.5 72B and Coder 32B and they only did a little better.

agildehaus

a year ago

Seems a very difficult problem to produce a response just on the text given and not past training. An LLM that can do that would seem to be quite more advanced than what we have today.

Though I would say humans would have difficulty too -- say, having read The Three Body problem before, then reading a slightly modified version (without being aware of the modifications), and having to recall specific details.

botanical76

a year ago

This problem is poorly defined; what would it mean to produce a response JUST based on the text given? Should it also forgo all logic skills and intuition gained in training because it is not in the text given? Where in the N dimensional semantic space do we draw a line (or rather, a surface) between general, universal understanding and specific knowledge about the subject at hand?

That said, once you have defined what is required, I believe you will have solved the problem.