Those citations come from it searching the web and summarizing, not from it's built in training data. Processes outside of the inference are tracking it.
If it were to give you a model-only response it could not determine where the information in it was sourced from.
Any LLM output is a combination of its weights from its training, and its context. Every token is some combination of those two things. The part that is coming from the weights is the part that has no technical means to trace back to its sources.
But even the part that is coming from the context is only being produced by the weights. As I said, every token is some mathematical combination of the weights and the context.
So it can produce text that does not correctly summarize the content in its context, on incorrectly reproduce the link, or incorrectly map the link to the part of its context that came from that link, or more generally just make shit up.
OK, I'll try to err towards the "5" with this one.
1. We built a machine that takes a bunch of words on a piece of paper, and suggests what words fit next.
2. A lot of people are using it to make stories, where you fill in "User says 'X'", and then the machine adds something like "Bot says 'Y'". You aren't shown the whole thing, a program finds the Y part and sends it to your computer screen.
3. Suppose the story ends, unfinished, with "User says 'Why did the chicken cross the road?'". We can use the machine to fix up the end, and it suggests "Bot says: 'To get to the other side!'"
4. Funny! But User character asks where the answer came from, the machine doesn't have a brain to think "Oh, wait that means ME!". Instead, it keeps making things longer in the same way as before, so that you'll see "words that fit" instead of words that are true. The true answer is something unsatisfying, like "it fit the math best".
5. This means there's no difference between "Bot says 'From the April Newsletter of Jokes Monthly'" versus "Bot says 'I don't feel like answering.'" Both are made-up the same way.
> Google's search result AI summary shows the links for example.
That's not the LLM/mad-libs program answering what data flowed into it during training, that's the LLM generating document text like "Bot runs do_web_search(XYZ) and displays the results." A regular normal program is looking for "Bot runs", snips out that text, does a regular web search right away, and then substitutes the results back inside.