btown
9 hours ago
> Facts are the meaning pulled out of each episode, stored as subject-predicate-object records with a plain summary and timestamps for when the fact was introduced and when it was invalidated (subject=person, predicate=works_at, object=company). Facts form a graph with typed edges between them: X is in tension with Y, A is derived from B, J supersedes K.
I've always thought that knowledge graphs/expert systems, and even the broader concept of entity-attribute-value storage, got an unfairly bad reputation because of the 1970s/1980s "AI Winter."
And I think that perhaps this reputation is why so much of the oxygen in the RAG space has been consumed by the notion that "RAG = retrieval of fragments by vector similarity."
The difference now from decades ago, of course, is that now LLMs can do both the job of maintaining that graph at scale, and being able to agentically run successive queries to explore for best practices in any situation! And these have reached the scalability where any small business can build and use their own expert system.
I really want to see this approach win, because I think there's such an opportunity to explore even more data structures and approaches from the past and how their impact can be reimagined. If LLMs do indeed approach AGI, it will be in large part due to the ability to use tools (there's some evolutionary irony there, too) - and we should be trying every kind of underlying storage for those tools that we can, standing on the shoulders of giants.
(And curious what database you use for the knowledge graph - those are also a place where we stand on the shoulders of giants!)
kanyesrthaker
9 hours ago
really great perspective. A lot of techniques from the past aren't conceptually wrong, we just have the tools today to make them efficient. The intuition behind them was always reasonable, if you could amortize the cost of making them work at scale. Appreciate the vote of confidence!
And re: the graph -- Postgres stays king here. There are a lot of fancy database mechanisms for building systems like this, but the convenience of a SQL data structure that can tie the graph into structured metadata is pretty unbeatable. This may evolve with time as well.
btown
9 hours ago
Yay for Postgres! Curious if you find yourself using recursive queries in Postgres to traverse the graph - or is there an LLM in the mix that's looking at the "frontier" of relevant facts and choosing whether to go deeper, and whether an entity has an alias?
(Along those lines, I recall lots of this getting messy in a pre-LLM project the moment someone said "merge these two CRM accounts and their histories, but oh whoops turns out they were different all along, and only some of the updates should have applied" - there's a whole set of interesting challenges around attributing EAV when the very notion of object identity evolves over time. Whether a fact is relevant is really a judgment that can only be made with full context - but we now have tools that eat context for breakfast!)
kanyesrthaker
4 hours ago
very sharp questions, love 'em. Yes, your intuition is correct. We by default will gather information k layers removed from the "frontier", and then have a shallow agentic step that can determine if we need to go further and at what nodes (essentially doing a graph traversal without a fixed termination condition). Relevance detection is a hard problem; we think we have something good, and we're experimenting/iterating towards something great