Interestingly, the original idea of what we call a "browser" nowadays – the "user agent" – was built on the premise that each user has specific needs and preferences. The user agent was designed to act on their behalf, negotiating data transfers and resolving conflicts between content author and user (content consumer) preferences according to "strengths" and various reconciliation mechanisms.
(The fact that browsers nowadays are usually expected to represent something "pixel-perfect" to everyone with similar devices is utterly against the original intention.)
Yet the original idea was (due to the state of technical possibilities) primarily about design and interactivity. The fact that we now have tools to extend this concept to core language and content processing is… huge.
It seems we're approaching the moment when our individual personal agent, when asked about a new page, will tell us:
Well, there's nothing new of interest for you, frankly:
All information presented there was present on pages visited recently.
-- or --
You've already learned everything mentioned there. (*)
Here's a brief summary: …
(Do you want to dig deeper, see the content verbatim, or anything else?)
Because its "browsing history" will also contain a notion of what we "know" from chats or what we had previously marked as "known".
It would have to have a pretty good model of my brain to help me make these decisions. Just as a random example, it will have to understand that an equation is a sort of thing that I’m likely to look up even if I understand the meaning of it, just to double check and get the particulars right. That’s an obvious example, I think there must be other examples that are less obvious.
Or that I’m looking up a data point that I already actually know, just because I want to provide a citation.
But, it could be interesting.
Well we should first establish some sort of contract how to convey the "I feel that I actually understand this particular piece of information, so when confronted with it in the future, you can mark is as such". My lines of thought were more about a tutorial page that would present the same techniques as course you have finished a week prior, or news page reporting on an event you just read about on a different news site a minute before … stuff like this … so you wold potentially save the time skimming/reading/understanding only to realise there was no added value for you in that particular moment. Or while scrolling through a comment section, hide comment parts repeating the same remark, or joke.
Or (and this is actually doable absolutely without any "AI" at all):
What the bloody hell actually newly appeared on this particular URL since my last visit?
(There is one page nearby that would be quite unusable for me, had I not a crude userscript aid for this particular purpose. But I can imagine having a digest about "What's new here?" / "Noteworthy responses?" would be way better.)
For the "I need to cite this source", naturally, you would want the "verbatim" view without any amendments anyway. Also probably before sharing / directing someone to the resource, looking at the "true form" would be still pretty necessary.
I can definitely see a future in which we are qch have our own personal memetic firewall, keeping us safe and cozy in our personal little worldview bubbles.
> Well, there's nothing new of interest for you, frankly
For this to work like a user would want, the model would have to be sentient.
But you could try to get there with current models, it'd just be very untrustworthy to the point of being pointless beyond a novelty
Not any more "sentient" than existing LLMs even in the limited chat context span are already.
Naturally, »nothing new of interest for you« here is indeed just a proxy for »does not involve any significant concept that you haven't previously expressed knowledge about« (or how to put it), what seems pretty doable, provided that contract of "expressing knowledge about something" had been made beforehand.
Let's say that all pages you have ever bookmarked you have really grokked (yes, a stretch, no "read it later" here) - then your personal model would be able to (again, figuratively) "make qualified guess" about your knowledge. Or some kind of tag that you could add to any browsing history entry, or fragment, indicating "I understand this". Or set the agent up to quiz you when leaving a page (that would be brutal). Or … I think you got the gist now.
In your cleanup step, after cleaning obvious junk, I think you should do whatever Firefox's reader mode does to further clean up, and if that fails bail out to the current output. That should reduce the number of tokens you send to the LLM even more
You should also have some way for the LLM to indicate there is no useful output because perhaps the page is supposed to be a SPA. This would force you to execute Javascript to render that particular page though
Would really love to see more functionality built into this. Handling post requests, enabling scripting, etc could all be super powerful
wonder if you can work on the DOM instead of HTML...
almost unrelated, but you can also compare spegel to https://www.brow.sh/