hackernews client

Show HN: I used Claude Code to discover connections between 100 books

524 pointsposted a month ago

(trails.pieterma.es)

158 Comments

johnwatson11218

a month ago

I did something similar whereby I used pdfplumber to extract text from my pdf book collection. I dumped it into postgresql, then chunked the text into 100 char chunks w/ a 10 char overlap. These chunks were directly embedded into a 384D space using python sentence_transformers. Then I simply averaged all chunks for a doc and wrote that single vector back to postgresql. Then I used UMAP + HDBScan to perform dimensionality reduction and clustering. I ended up with a 2D data set that I can plot with plotly to see my clusters. It is very cool to play with this. It takes hours to import 100 pdf files but I can take one folder that contains a mix of programming titles, self-help, math, science fiction etc. After the fully automated analysis you can clearly see the different topic clusters.

I just spent time getting it all running on docker compose and moved my web ui from express js to flask. I want to get the code cleaned up and open source it at some point.

johnwatson11218

23 days ago

I posted my code https://github.com/johnwatson11218/LatentTopicExplorer

johnwatson11218

a month ago

Thanks for the supportive comments. I'm definitely thinking I should release sooner rather than later. I have been using LLM for specific tasks and here is some sample stored procedure I had an LLM write for me.

-- -- Name: refresh_topic_tables(); Type: PROCEDURE; Schema: public; Owner: postgres --

CREATE PROCEDURE public.refresh_topic_tables() LANGUAGE plpgsql AS $$ BEGIN -- Drop tables in reverse dependency order DROP TABLE IF EXISTS topic_top_terms; DROP TABLE IF EXISTS topic_term_tfidf; DROP TABLE IF EXISTS term_df; DROP TABLE IF EXISTS term_tf; DROP TABLE IF EXISTS topic_terms;

    -- Recreate tables in correct dependency order
    CREATE TABLE topic_terms AS
    SELECT
        dt.term_id,
        dot.topic_id,
        COUNT(DISTINCT dt.document_id) as document_count,
        SUM(frequency) as total_frequency
    FROM document_terms dt
    JOIN document_topics dot ON dt.document_id = dot.document_id
    GROUP BY dt.term_id, dot.topic_id;

    CREATE TABLE term_tf AS
    SELECT
        topic_id,
        term_id,
        SUM(total_frequency) as term_frequency
    FROM topic_terms
    GROUP BY topic_id, term_id;

    CREATE TABLE term_df AS
    SELECT
        term_id,
        COUNT(DISTINCT topic_id) as document_frequency
    FROM topic_terms
    GROUP BY term_id;

    CREATE TABLE topic_term_tfidf AS
    SELECT
        tt.topic_id,
        tt.term_id,
        tt.term_frequency as tf,
        tdf.document_frequency as df,
        tt.term_frequency * LN( (SELECT COUNT(id) FROM topics) / GREATEST(tdf.document_frequency, 1)) as tf_idf
    FROM term_tf tt
    JOIN term_df tdf ON tt.term_id = tdf.term_id;

    CREATE TABLE topic_top_terms AS
    WITH ranked_terms AS (
        SELECT
            ttf.topic_id,
            t.term_text,
            ttf.tf_idf,
            ROW_NUMBER() OVER (PARTITION BY ttf.topic_id ORDER BY ttf.tf_idf DESC) as rank
        FROM topic_term_tfidf ttf
        JOIN terms t ON ttf.term_id = t.id
    )
    SELECT
        topic_id,
        term_text,
        tf_idf,
        rank
    FROM ranked_terms
    WHERE rank <= 5
    ORDER BY topic_id, rank;

    RAISE NOTICE 'All topic tables refreshed successfully';

EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION 'Error refreshing topic tables: %', SQLERRM; END; $$;

ct0

a month ago

This sounds amazing, totally interested in seeing the approach and repo.

hellisad

a month ago

Sounds a lot like Bertopic. Great library to use.

fittingopposite

a month ago

Yes. Please publish. Sounds very interesting

8organicbits

a month ago

Can someone break this down for me?

I'm seeing "Thanos committing fraud" in a section about "useful lies". Given that the founder is currently in prison, it seems odd to consider the lie useful instead of harmful. It kinda seems like the AI found a bunch of loosely related things and mislabeled the group.

If you've read these books I'm not seeing what value this adds.

Closi

a month ago

I guess the lies were useful until she got caught?

user

a month ago

[deleted]

irishcoffee

a month ago

Why lie if it isn’t useful? Lying is generally bad, why do a generally bad thing if there isn’t at least a justification, a “use” if you will.

PeterStuer

a month ago

Be careful with the 'utility' model of explaining behavior. It is fairly easy to slide into 'if behavior X is manifested, this must mean X must somehow be useful'. You can use this model to explain behavior, but be aware of the circularity trap in the model. "She lied thus the lie must have had use, even if it is not obvious we will discover the utility if we dig down enough".

Another model can be post-rationalization. People just do stuff instinctively, then rationalize why they did them after the fact. "She lied without thinking about it, then constructed a reasoning why the lie was rational to begin with".

At the extremes, some people will never lie, even to their detriment. Usually they seem to attribute this to virtue. Others will always lie. They seem to feel not lying is surrendering control. Most people are somewhere in between.

Terretta

a month ago

Thanos is the comic book villain snapping his fingers.

Theranos is the fraud mentioned in the piece.

theturtletalks

a month ago

In a similar vein, I've been using Claude Code to "read" Github projects I have no business understanding. I found this one trending on Github with everything in Russian and went down the rabbit hole of deep packet inspection[0].

0. https://github.com/ValdikSS/GoodbyeDPI

noname120

a month ago

ValdikSS is the guy behind the SBC XQ patches for Android (that alas were never merged by G). I didn’t expect to see him here with another project!

https://habr.com/en/articles/456476/

https://android-review.googlesource.com/c/platform/system/bt...

dinkleberg

a month ago

That's a cool idea. There are so many interesting projects on GitHub that are incomprehensible without a ton of domain context.

theturtletalks

a month ago

I got the idea from an old post on here called Story of Mel[0] where OP talks about the beauty of Mel's intricate machine code on a RPC-4000.

This is the part that always stuck with me:

I have often felt that programming is an art form, whose real value can only be appreciated by another versed in the same arcane art; there are lovely gems and brilliant coups hidden from human view and admiration, sometimes forever, by the very nature of the process. You can learn a lot about an individual just by reading through his code, even in hexadecimal. Mel was, I think, an unsung genius.

0. http://catb.org/esr/jargon/html/story-of-mel.html

coolewurst

a month ago

Thank you for sharing that story. Mel seems virtuousic, but is that really art? Optimizing pattern positioning on a drum for maximum efficiency. Is that expression?

maxbond

a month ago

> Is that expression?

If it wasn't expression everyone would get the same result. But no one else at Royal McBee did things the way Mel Kaye did things.

Kaye had a strong artistic vision for how things should be done; he didn't want to use the ergonomic features of the RPC-4000 because they didn't align with his vision. I think he found the idea of rigging the blackjack program offensive in part for the same reason.

Speaking for myself, I have always found the story and "pessimal" instructions beautiful. It's my favorite piece of folklore of all time. Kaye and Nather are both artists to me.

Tangentially, Kaye is standing on the far right in this photo.

https://zappa.brainiac.com/MelKaye.png

And here is Nather.

https://en.wikipedia.org/wiki/Ed_Nather#/media/File:Ednather...

Abstract_Typist

a month ago

If you consider engineering the art of the possible. (Yes, I know it's a politician's phrase, that's because politics is the art of the plausible ... )

pxc

a month ago

I read a book maybe a decade ago on the "digital humanities". I wish now I could remember the title and author. :(

Anyway, it introduced me to the idea of using computational methods in the humanities, including literature. I found it really interesting at the time!

One of the the terms it introduced me to is "distant reading", whose name mirrors that of a technique you may have studied in your gen eds if you went to university ('close reading"). The idea is that rather than zooming in on some tiny piece of text to examine very subtle or nuanced meanings, you zoom out to hundreds or thousands of texts, using computers to search them for insights that only emerge from large bodies of work as wholes. The book argued that there are likely some questions that it is only feasible to ask this way.

An old friend of mine used techniques like this for dissertation in rhetoric, learning enough Python along the way to write the code needed for the analyses she wanted to do. I thought it was pretty cool!

I imagine LLMs are probably positioned now to push distant reading forward in an number of ways: enabling new techniques, allowing old techniques to be used without writing code, and helping novices get started with writing some code. (A lot of the maintainability issues that come with LLM code generation happily don't apply to research projects like this.)

Anyway, if you're interested in other computational techniques you can use to enrich this kind of reading, you might enjoy looking into "distant reading": https://en.wikipedia.org/wiki/Distant_reading

plutokras

a month ago

> I wish now I could remember the title and author.

LLMs are great at finding media by vague descriptions. ;)

ako

a month ago

According to Claude (easy guess from the wikipedia link?):

The book is almost certainly by *Franco Moretti*, who coined the term "distant reading." Given the timeframe ("maybe a decade ago") and the description, it's most likely one of these two:

1. *"Distant Reading"* (2013) — A collection of Moretti's essays that directly takes the concept as its title. This would fit well with "about a decade ago."

2. *"Graphs, Maps, Trees: Abstract Models for Literary History"* (2005) — His earlier and very influential work that laid out the quantitative, computational approach to literary analysis, even if it didn't use "distant reading" as prominently in the title.

Moretti, who founded the Stanford Literary Lab, was the major proponent of the idea that we should analyze literature not just through careful reading of individual canonical texts, but through large-scale computational analysis of hundreds or thousands of works—looking at trends in genre evolution, plot structures, title lengths, and other patterns that only emerge at scale.

Given that the commenter specifically remembers learning the term "distant reading" from the book, my best guess is *"Distant Reading" (2013)*, though "Graphs, Maps, Trees" is also a strong possibility if their memory of "a decade" is approximate.

pxc

a month ago

After some digging, I think it was likely this one: https://direct.mit.edu/books/book/5346/Digital-Humanities

user

a month ago

[deleted]

smusamashah

a month ago

I dont understand the lines connecting two pieces of text. In most cases, the connected words have absolutely zero connection with each other.

In "Father wound" the words "abandoned at birth" are connected to "did not". Which makes it look like those visual connections are just a stylistic choice and don't carry any meaning at all.

Oras

a month ago

I had the exact same impression.

hecanjog

a month ago

Yes, they look really good but they're being connected by an LLM.

chrisgd

a month ago

Really great work but have to agree with others that I don’t see the threads.

The one I found most connected that the LLm didn’t was a connection between Jobs and the The Elephant in the Brain

The Elephant in the Brain: The less we know of our own ugly motives, the easier it is to hide them from others. Self-deception is therefore strategic, a ploy our brains use to look good while behaving badly.

Jobs: “He can deceive himself,” said Bill Atkinson. “It allowed him to con people into believing his vision, because he has personally embraced and internalized it.”

urbandw311er

a month ago

This feels like a nice idea but the connection between the theme and the overarching arc of each book seems tenuous at best. In some cases it just seems to have found one paragraph from thousands and extrapolated a theme that doesn’t really thread through the greater piece.

I do like the idea though — perhaps there is a way to refine the prompting to do a second pass or even multiple passes to iteratively extract themes before the linking step.

Balgair

a month ago

Wow! Amazing!

Have you read the Syntopicon by Mortimer J Adler?

It's right up your alley on this one. It's essentially this, but in 1965, by hand, with Isaac Asimov and William F Buckley Jr, among others.

Where did you get the books from? I've been trying to do something like this myself, but haven't been able to get good access to books under copyright.

Yeah, thinking a bit more here, you've created a Syntopicon. I've always wanted to make a modern one too! You can do the old school late night Wikipedia reading session with the trails idea of yours. Brilliant!

Really though, how can I help you make this bigger?

tolerance

a month ago

I don’t like this product as a service to readers (i.e., people who read as a cognitive/philosophical exploit) but I do think that somewhere embedded in its backend there are things of benefit.

I think that this sucks the discreet joy out of reading and learning. Having the ways that the topics within a certain book can cross over in lead into another book of a different topic externalized is hollowing and I don’t find it useful.

On the other hand I feel like seeing this process externalized gives us a glimpse at how “the algorithms” (read: recommender systems) suggest seemingly disjunctive content to users. So as a technical achievement I can’t knock what you’ve done and I’m satisfied to see that you’re the guy behind the HN Book map that I thought was nice too.

At its core this looks like a representation of the advantages that LLMs can afford to the humanities. Most of us know how Rob Pike feels about them. I wonder if his senior former colleague feels the same: https://www.cs.princeton.edu/~bwk/hum307/index.html. That’s a digression, but I’d like to see some people think in public about how to reasonably use these tools in that domain.

mathgeek

a month ago

> Having the ways that the topics within a certain book can cross over in lead into another book of a different topic externalized is hollowing and I don’t find it useful.

Intuitively, I agree. This feels like the different between being a creator (of your own thoughts as inspired by another person's) and a consumer (although in a somewhat educational sense). There would need to be a big advantage to being taught those initial thoughts, analogous to why we teach folks algebra/calculus via formulas rather than having every student figure out proofs for themselves.

bonkusbingus

a month ago

"There are, you see, two ways of reading a book: you either see it as a box with something inside and start looking for what it signifies, and then if you're even more perverse or depraved you set off after signifiers. And you treat the next book like a box contained in the first or containing it. And you annotate and interpret and question, and write a book about the book, and so on and on. Or there's the other way: you see the book as a little non-signifying machine, and the only question is "Does it work, and how does it work?" How does it work for you? If it doesn't work, if nothing comes through, you try another book. This second way of reading's intensive: something comes through or it doesn't. There's nothing to explain, nothing to understand, nothing to interpret." — Gilles Deleuze

drakeballew

a month ago

I am not familiar with the source of this quote, but I don't disagree, it is just incredibly reductive. Gilles Deleuze him-/her-self was not born and did not live in a vacuum. They were influenced and mimetically reproduced ideas they were exposed to, like we all do. I don't find the point of this project meaningless myself. The opposite in fact. But the results are not accurate for anyone who has actually read any of these texts.

jennyholzer6

a month ago

[dead]

amadeuswoo

a month ago

The feedback loop you describe—watching Claude's logs, then just asking it what functionality it wished it had—feels like an underexplored pattern. Did you find its suggestions converged toward a stable toolset, or did it keep wanting new capabilities as the trails got more sophisticated?

samuelknight

a month ago

I do this all the time in my Claude code workflow: - Claude will stumble a few times before figuring out how to do part of a complex task - I will ask it to explain what it was trying to do, how it eventually solved it, and what was missing from its environment. - Trivial pointers go into the CLAUDE.md. Complex tasks go into a new project skill or a helper script

This is the best way to re-enforce a copilot because models are pretty smart most of the time and I can correct the cases where it stumbles with minimal cognitive effort. Over time I find more and more tasks are solved by agent intelligence or these happy path hints. As primitive as it is, CLAUDE.md is the best we have for long-term adaptive memory.

pmaze

a month ago

I ended up judging where to draw the line. Its initial suggestions were genuinely useful and focused on making the basic tool use more efficient. e.g. complaining about a missing CLI parameter that I'd neglected to add for a specific command, requesting to let it navigate the topic tree in ways I hadn't considered, or new definitions for related topics. After a couple iterations the low hanging fruit was exhausted, and its suggestions started spiralling out beyond what I thought would pay off (like training custom embeddings). As long as I kept asking it for new ideas, it would come up with something, but with rapidly diminishing returns.

lkbm

a month ago

Earlier today, I was thinking about doing something somewhat similar to this.

I was recently trying to remember a portal fantasy I read as a kid. Goodreads has some impressive lists, not just "Portal Fantasies"[0], but "Portal Fantasies where the portal is on water[1], and a seven more "where/what's the portal" categories like that.

But the portal fantasy I was seeking is on the water and not on the list.

LLMs have failed me so far, as has browsing the larger portal fantasy list. So, I thought, what if I had an LLM look through a list of kids books published in the 1990s and categorize "is this a portal fantasy?" and "which category is the portal?"

I would 1. possibly find my book and 2. possibly find dozens of books I could add to the lists. (And potentially help augment other Goodread-like sites.)

Haven't done it, but I still might.

Anyway, thanks for making this. It's a really cool project!

[0] https://www.goodreads.com/list/show/103552.Portal_Fantasy_Bo...

[1] https://www.goodreads.com/list/show/172393.Fiction_Portal_is...

znnajdla

a month ago

This is really, really, good. Ignore the commenters in this thread who don’t see the connections. It takes a very high degree of artistic creativity and linguistic imagination to see these types of connections, and many of the “engineer types” on this forum are unfamiliar with that mode of thinking. Ignore them. Every one of these connected threads are really good.

catlifeonmars

a month ago

It’s the opposite. The connections are so trivial/obvious as to be uninteresting.

nkrisc

a month ago

I’m not surprised that it found connections when you told it to find connections. Most of those connections seem rather dubious to me. I think you’d have been better off coming up with these yourself.

drakeballew

a month ago

This is a beautiful piece of work. The actual data or outputs seem to be more or less...trash? Maybe too strong a word. But perhaps you are outsourcing too much critical thought to a statistical model. We are all guilty of it. But some of these are egregious, obviously referential LLM dog. The world has more going on than whatever these models seem to believe.

Edit/update: if you are looking for the phantom thread between texts, believe me that an LLM cannot achieve it. I have interrogated the most advanced models for hours, and they cannot do the task to any sort of satisfactory end that a smoked-out half-asleep college freshman could. The models don't have sufficient capacity...yet.

liqilin1567

a month ago

When I saw that the trail goes through just one word like "Us/Them", "fictions" I thought it might be more useful if the trail went through concepts.

tmountain

a month ago

The links drawn between the books are “weaker than weak” (to quote Little Richard). This is akin to just thumbing the a book and saying, “oh, look, they used the word fracture and this other book used the word crumble, let’s assign a theme.” It’s a cool idea, but fails in the execution.

usefulposter

a month ago

Yes. It's flavor-of-the-month Anthropic marketing drivel: tenuous word associations edition¹.

¹ Oh, that's just LLMs in general? Cool!

georgebcrawford

a month ago

I spent 30 seconds and the first word that came to mind was drivel.

As an English teacher this shit makes me hate LLMs even more. Like so much techbro nonsense, it completely ignores what makes us human.

rtgfhyuj

a month ago

give it a more thorough look maybe?

https://trails.pieterma.es/trail/collective-brain/ is great

eloisius

a month ago

It’s any interesting thread for sure, but while reading through this I couldn’t help but think that the point of these ideas are for a person to read and consider deeply. What is the point of having a machine do this “thinking” for us? The thinking is the point.

DrewADesign

a month ago

And that’s the problem with a lot of chatbot usage in the wild: it’s saving you from having to think about things where thinking about them is the point. E.g. hobby writing, homework, and personal correspondence. That’s obviously not the only usage, but it’s certainly the basis for some of the more common use cases, and I find that depressing as hell.

rtgfhyuj

a month ago

so consider them deeply. Why does the value diminish if discovered by a machine as long as the value is in the thinking?

znnajdla

a month ago

This is a software engineering forum. Most of the engineer types here lack the critical education needed to appreciate this sort of thing. I have a literary education and I’m actually shocked at how good most of these threads are.

PinkMilkshake

a month ago

I think most engineer types avoid that kind of analysis on purpose.

znnajdla

a month ago

Programmers tend to lean two ways: math-oriented or literature-oriented. The math types tend to become FAANG engineers. The literature oriented ones tend to start startups and become product managers and indie game devs and Laravel artisans.

only-one1701

a month ago

That doesn’t speak well towards your literary education, candidly.

znnajdla

a month ago

We should try posting this on a literary discussion forum and see the responses there. I expect a lot of AI FUD and envy but that’ll be evidence in this tools favor.

only-one1701

a month ago

lol yes that’s the only reason anyone could find this uh literary analysis less than compelling

pcrh

a month ago

I had a look at that. The notion of a "collective brain" is similar to that of "civilization". It is not a novel notion, and the connections shown there are trivial and uninspiring.

what-the-grump

a month ago

Build a rag with significant amount of text, extract it by key word topic, place, date, name, etc.

… realize that it’s nonsense and the LLM is not smart enough to figure out much without a reranker and a ton of technology that tells it what to do with the data.

You can run any vector query against a rag and you are guaranteed a response. With chunks that are unrelated any way.

electroglyph

a month ago

unrelated in any way? that's not normal. have you tested the model to make sure you have sane output? unless you're using sentence-transformers (which is pretty foolproof) you have to be careful about how you pool the raw output vectors

baxtr

a month ago

I checked 2-3 trails and have to agree.

Take for example the OODA loop. How are the connections made here of any use? Seems like the words are semantically related but the concept are not. And even if they are, so what?

I am missing the so what.

Now imagine a human had read all these books. It would have come up with something new, I’m pretty sure about that.

https://trails.pieterma.es/trail/tempo-gradient/

wry_durian

a month ago

Indeed, I'm not seeing a "so what" here. LLMs make mental models cheap, but all models are wrong, and this one is too. The inclusion of Donalla Meadows' book and the quote from The Guns of August are particularly tenuous.

timoth3y

a month ago

What meaningful connections did it uncover?

You have an interesting idea here, but looking over the LLM output, it's not clear what these "connections" actually mean, or if they mean anything at all.

Feeding a dataset into an LLM and getting it to output something is rather trivial. How is this particular output insightful or helpful? What specific connections gave you, the author, new insight into these works?

You correctly, and importantly point out that "LLMs are overused to summarise and underused to help us read deeper", but you published the LLM summary without explaining how the LLM helped you read deeper.

pmaze

a month ago

The connections are meaningful to me in so far as they get me thinking about the topics, another lens to look at these books through. It's a fine balance between being trivial and being so out there that it seems arbitrary.

A trail that hits that balance well IMO is https://trails.pieterma.es/trail/pacemaker-principle/. I find the system theory topics the most interesting. In this one, I like how it pulled in a section from Kitchen Confidential in between oil trade bottlenecks and software team constraints to illustrate the general principle.

timoth3y

a month ago

Can you walk me though some of the insights you gained? I've read several of those books, including Kitchen Confidential and Confessions of an Economic Hit Man, and I don't see the connection that the LLM (or you) is trying to draw. What is the deeper insight into these works that I am missing?

I'm not familiar with he term "Pacemaker Principle" and Google search was unhelpful. What does it mean in this context? What else does this general principle apply to?

I'm perfectly willing to believe that I am missing something here. But reading thought many of the supportive comments, it seems more likely that this is an LLM Rorschach test where we are given random connections and asked to do the mental work of inventing meaning in them.

I love reading. These are great books. I would be excited if this tool actually helps point out connections that have been overlooked. However, it does not seem to do so.

varenc

a month ago

> Can you walk me though some of the insights you gained?

This made me realize that so many influential figures have either absent fathers, or fathers that berated them or didn't give them their full trust/love. I think there's something to the idea that this commonality is more than coincidence. (that's the only topic of the site I've read through yet, and I ignored the highlighted word connections)

gchamonlive

a month ago

> we are given random connections and asked to do the mental work of inventing meaning in them

How is that different from having an insight yourself and later doing the work to see if it holds on closer inspection?

delusional

a month ago

Don't ask me to elaborate on this, because it's kinda nebulous in my mind. I think there's a difference between being given an insight and interrogating that on your own initiative, and being given the same insight.

gchamonlive

a month ago

I don't doubt there is a difference in the mechanism of arriving at a given connection. What I think it's not possible to distinguish is the connection that someone made intuitively after reading many sources and the one that the AI makes, because both will have to undergo scrutiny before being accepted as relevant. We can argue there could be a difference in quality, depth and search space, maybe, but I don't think there is an ontological difference.

fwip

a month ago

The one that you thought of in the shower has a much greater chance of being right, and also of being relevant to you.

gchamonlive

a month ago

Has it? Why?

fwip

a month ago

Because humans aren't morons tasked with coming up with 100 connections.

gchamonlive

a month ago

Doesn't explain why a connection made in the shower has in essence more merit than a connection an LLM was instructed to come up with.

fwip

a month ago

Not sure how to make it clearer. Look at the quality of this post, and compare it to your shower thoughts. I imagine you're not as stupid as the machine was.

Aurornis

a month ago

I like design that highlights words in one summary and links them to highlights in the next. It's a cool idea

But so many of the links just don't make sense, as several comments have pointed out. Are these actually supposed to represent connections between books, or is it just a random visual effect that's suppose to imply they're connected?

I clicked on one category and it has "Us/Them" linked to "fictions" in the next summary. I get that it's supposed to imply some relationship but I can't parse the relationships

rjh29

a month ago

100 books is too small a datasize - particularly given it's a set of HN recommendations (i.e. a very narrow and specific subset of books). A larger set would probably draw more surprising and interesting groupings.

DyslexicAtheist

a month ago

> 100 books is too small a datasize

this to me sounds off. I read the same 8, to 10 books over and over and with every read discover new things. the idea of more books being more useful stands against the same books on repeat. and while I'm not religious, how about dudes only reading 1 book (the Bible, or Koran), and claiming that they're getting all their wisdom from these for a 1000 years?

If I have a library of 100+ books and they are not enough then the quality of these books are the problem and not the number of books in the library?

lisdexan

a month ago

Finally, Schizophrenia as a Service (SaaS).

hecanjog

a month ago

You really know what a good interface should be like, this is really inspiring. So is the design of everything I've seen on your website!

I won't pile on to what everyone else has said about the book connections / AI part of this (though I agree that part is not the really interesting or useful thing about your project) but I think a walk-through of how you approach UI design would be very interesting!

jennyholzer6

a month ago

[dead]

user

a month ago

[deleted]

user

a month ago

[deleted]

zkmon

a month ago

Given the common goals of every book (fame and sales by grabbing user attention), the general themes and styles would have high similarity. It's like flowers with bright colors and nice shapes.

Orwelliian motives (sheer egoism, aesthetic enthusiasm, historical impulse and political purposes) are somewhat dated.

andai

a month ago

I tried using Claude Web to help me understand a textbook recently.

The book was really big and it got stuck in "indexing". (Possibly broke the indexer?) But thanks to the CLI integration, it was able to just iteratively grep all the info it needed out of it. I found this very amusing.

Anthropic's article on retrieval emphasizes the importance of keyword search, since they often outperform embeddings depending on the query. Their own approach is a hybrid:

https://www.anthropic.com/engineering/contextual-retrieval

hising

a month ago

Yeah, I had a similar idea, I used Open AI API to break down movies into the 3 act structure, narrative, pacing, character arcs etc and then trying to find movies that are similar using PostgreSQL with pgvector. The idea was to have another way to find movies I would like to watch next based on more than "similar movies" in IMDb. Threw some hours at it, but I guess it is a system that needs a lot of data, a lot of tokens and enormous amount of tweaking to be useful. I love your idea! I agree with you on that we could use LLM:s for this kind of stuff that we as humans are quite bad at.

frankdenbow

a month ago

Love this, it is interesting to see the links between topics, with things like father son relationship. I have a long queue of books to read that this year I finally set aside planned time on the calendar to read and walk. There are specific topics I want to read about so even if it just helped with finding some experts from books around a topic, it would help me decide. I think you're onto something here.

djeastm

a month ago

While I don't think the section titles or one-sentence summaries accurately reflect the rather tenuous textual connections, I did find it strangely intriguing to scroll through the paragraphs of the books and just catch different bits of ideas from these writers.

It's like grabbing a half-dozen books off the library shelf, opening to a random page in each, then flit through them, kind of like a "engineering nerd book sample platter".

Aurornis

a month ago

It’s interesting how many of the descriptions have a distinct LLM-style voice. Even if you hadn’t posted how it was generated I would have immediately recognized many of the motifs and patterns as LLM writing style.

The visual style of linking phrases from one section to the next looks neat, but the connections don’t seem correct. There’s a link from “fictions” to “internal motives” near the top of the first link and several other links are not really obviously correct.

pmaze

a month ago

The names & descriptions definitely have that distinct LLM flavour to them, regardless of which model I used. I decided to keep them, but as short as possible. In general, I find the recombination of human-written text to be the main interest.

There's two stages to the linking: first juxtaposing the excerpts, then finding and linking key phrases within them. I find the excerpts themselves often have interesting connections between them, but the key phrases can be a bit out there. The "fictions" to "internal motives" one does gel for me, given the theme of deceiving ourselves about our own motivations.

reedf1

a month ago

Well even the post itself reads to me as AI generated

akshay326

a month ago

This is pretty cool, I wanted to do something with my Readwise Reader too. Jinx today claude code created a 3D Neo4J visualizer tool for YC advice + my quotes collection. Code here - https://github.com/akshay326/quote-viz

rhgraysonii

a month ago

You might enjoy my tool deciduous. It is for building knowledge trees and reference stuff exactly like this. The website tells a bit more http://notactuallytreyanastasio.github.io/deciduous/

fudged71

a month ago

Interesting. Was this inspired by the "Context Graphs" concept discussed on X?

rhgraysonii

25 days ago

No, I don’t hang out at the nazi bar.

itsangaris

a month ago

surprised to that "seeing like a state" didn't get included in the "legibility tax" category

guidoism

a month ago

Nice! I've been using Claude Code and ChatGPT for something similar. My inspiration is Adler's concept of The Great Conversation and Adler's Propædia. I've been able to jump between books to read about the same concept from different author's perspectives.

Balgair

a month ago

This is his Syntopicon for modern works, and automated. It's amazing, I've been wanting to do this for a while but haven't had the time.

I really think we all should sync up and talk more. I want to make this bigger.

dexterlagan

a month ago

I had the same idea. I think this is very useful. As it is it does look like a proof-of-concept, and that's OK. I'd develop this as a book recommendation site and simply link to the books on Amazon or your preferred book source. Collect cash on referrals. Good stuff!

JimmyJamesJames

a month ago

Like this initial step and its findings.

#1: would a larger dataset increase the depth and breadth of insight ( go to #2) #2: with the initial top 100, are there key ‘super node’ books that stand out as ones to read due the breadth they offer. Would a larger dataset identify further ‘super node’ books.

amelius

a month ago

Makes me wonder, how well could an LLM-based solution score on the Netflix prize?

https://en.wikipedia.org/wiki/Netflix_Prize

(Are people still trying to improve upon the original winning solution?)

trinsic2

a month ago

Wow! This is a great idea. Really well put together site on that topic. I'm big into intuition and the "Expert Intuition" collation points to an area of science I rarely look at. Thanks for your work on this.

sciences44

a month ago

Love the originality here - makes you curious to explore more.

Solid technical execution too. Well done!

barrenko

a month ago

On a long enough timeline, we will be using Claude Code for .. any.. type of work?

neodypsis

a month ago

I looked at the trails (lists of passages) and wondered whether it would be cheaper to build these lists by clustering passage embeddings, rather than having an LLM construct them.

dev_l1x_be

a month ago

Claude code is good for arranging random things into categories, with code, configuration and documentation files it is barely goes into random rabbit holes or hallucinates categories for me.

adsharma

a month ago

This is GraphRAG using SQLite.

Wouldn't it be good if recursive Leiden and cypher was built into an embedded DB?

That's what I'm looking into with mcp-server-ladybug.

nurettin

a month ago

I did something similar with occult books and famous programming resources.

Conclusion: you find wisdom in everything if you look for it.

gvedem

a month ago

we discordians refer to this as the Law of Fives:

>The Law of Fives states simply that: ALL THINGS HAPPEN IN FIVES, OR ARE DIVISIBLE BY OR ARE MULTIPLES OF FIVE, OR ARE SOMEHOW DIRECTLY OR INDIRECTLY APPROPRIATE TO 5.

>The Law of Fives is never wrong.

>In the Erisian Archives is an old memo from Omar to Mal-2: "I find the Law of Fives to be more and more manifest the harder I look."

nurettin

a month ago

yes exactly. Replace five with shit and it still holds.

butterNaN

a month ago

I feel this could be done better by processing and clustering the References in non-fiction books

threecheese

a month ago

Where did you come across Leiden partitioning? I’m facing a similar use case and wonder what you’re reading.

fittingopposite

a month ago

Pretty new graph clustering algorithm (published in 2019). Original publication which is actually fairly readable: https://www.nature.com/articles/s41598-019-41695-z

stogot

a month ago

I appreciate the idea, but looking at some of the trails… The results do not make sense

dangoodmanUT

a month ago

The UI animations are so fun

typon

a month ago

The website design and content are much nicer than the "ideas" here. Just standard LLM slop once if you actually have read some of these books.

podgorniy

a month ago

Cool stuff. Thanks for conceptualizing, implementing and sharing

froil

a month ago

Do you have details of the tech stack? Really loved it..

simonw

a month ago

There's a useful write-up of that here: https://pieterma.es/syntopic-reading-claude/#how-its-impleme...

jgalt212

a month ago

What did it say about who wrote To Kill a Mockingbird?

chromanoid

a month ago

> A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems - as if the task itself summoned a Foucault’s Pendulum mindset.

I really appreciate you mentioning this. I think this is the nature of LLMs in general. Any symbol it processes can affect its reasoning capabilities.

joe_the_user

a month ago

A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems - as if the task itself summoned a Foucault’s Pendulum mindset.

It's all fun and game 'till someone loses an eye/mind/even-tenuous-connection-to-reality.

Edit: I'd mention that the themes Claude finds qualify as important stuff imo. But they're all pretty grim and it's a bit problematic focusing on them for a long period. Also, they are often the grimmest spin things that are well known.

drakeballew

a month ago

Don't believe Claude, let's put it that way.

sgt101

a month ago

Interesting - ages ago I used SML to look at the relationships between Shakespears plays.

https://medium.com/gft-engineering/using-text-embeddings-and...

Validation is a problem here - you find relationships, but so what? Is it right.... I can't say. It is interesting though.

user

a month ago

[deleted]

iamzamek

a month ago

This is so cool!

pennaMan

a month ago

this is amazingly cool, great work!

localhost

a month ago

First thank you for coming up with this idea. Love it. In the spirit of "tools for thought" I think this scratches an important itch.

Someone called the output "egregious, obviously referential LLM dog". And someone else called this trail an example of that: https://trails.pieterma.es/trail/tempo-gradient/

What struck me when reading it was that I parachuted into that trail with zero context. So I asked an LLM to generate some surrounding context for me: this is the verbatim prompt that I just gave gpt-5.2/medium running in my own agentic harness:

> read this: https://trails.pieterma.es/trail/tempo-gradient/. then i want you to write a short essay that shows the relationship between the different quotes making sure to use each quote in the process. the narrative should help me understand the links between the different quotes in each book in context. feel free to use your own knowledge of the books and your general knowledge to make this an approachable thing. what another llm did here is cool - it found relationships between this set of quotes. now i want you to materialize them into a compelling essay that invites curiosity and teaches me something. write the essay to tempo-gradient.md and respond in chat with how you thought about the problem and whether you thought these relationships were meaningful.

This was my agent's response:

Wrote the essay to tempo-gradient.md.

How I thought about it: I treated “tempo gradient” as the shared mechanism behind all eight excerpts—mismatch in loop speed (notice → interpret → decide → act) caused by attention, infrastructure latency, organizational rigidity, system delays, stealthy manipulation, or runaway acceleration. Then I ordered the quotes from “inner time” (tennis/attention) → engineered milliseconds (HFT) → explicit doctrine (OODA) → industrial schedules (mobilization) → structural delays (Beer Game/economy) → slow covert tempo (cyber) → extreme fast tempo (AI takeoff), so each quote sets up the next.

Whether the relationships felt meaningful: yes, mostly. They’re not “the same idea” so much as the same shape appearing in different contexts—feedback delay and decision cadence determining who gets to set the terms of reality for everyone else. The only relationship that feels a bit more interpretive than direct is tennis ↔ geopolitics, but even there the bridge (“how many nows” as a controllable input to decision tempo) is surprisingly sturdy.

And this is the output: https://gist.github.com/jflam/49753b7da64a74f07e35f6e2465f1a...

wormpilled

a month ago

>A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems

Interesting... seems like it wants the keys on your system! ;)

mannanj

a month ago

Seems like a lot of successful leaders have a history of or normalize deception and lying for some benefit.

only-one1701

a month ago

This is an IQ test lol

jennyholzer6

a month ago

[dead]

user

a month ago

[deleted]

jereees

a month ago

now do this for research papers! fun stuff :)

pixelsmithy

a month ago

Speaking of offloading thinking… I asked Gemini 3.0 Pro what it thought of this concept and criticisms in the comments: What are your thoughts on this project? Note criticisms in the comment section and tell me if you agree with any of them. https://news.ycombinator.com/item?id=46567400

This project, titled "Show HN: I used Claude Code to discover connections between 100 books," is an attempt to use an LLM (Claude) to perform "syntopical reading"—analyzing a library of 100 Hacker News favorite books to find thematic "trails" and connections between seemingly disparate texts. The author used CLI tools and LLM orchestration to index topics and generate these connections, aiming to move beyond simple summarization toward deeper insight.

Below are my thoughts on the project, followed by an analysis of the specific criticisms raised in the thread.

My Thoughts Conceptually, this is a fascinating experiment in "digital humanities." It attempts to automate a very high-level cognitive task: synthesis. Usually, we use computers for retrieval (search) or storage. Using them to find semantic bridges between concepts like "Jobs' reality distortion field" and "Theranos' fake demos" is a compelling use case for LLMs.

However, the execution reveals the current limits of this technology. The resulting "connections" often feel like a parlor trick—impressive that the machine did it, but often lacking the "click" of genuine insight. The project succeeds more as a technical visualization of vector embeddings than as a literary tool. It produces a map of linguistic proximity rather than conceptual necessity.

Criticisms & Agreement Analysis Here are the main criticisms from the comment section and my take on them:

1. The "Rorschach Test" / Spurious Connections Criticism: Users like tmountain, smusamashah, and timoth3y argue that the connections are "weaker than weak" or purely surface-level (e.g., linking "fracture" in one book to "crumble" in another). They suggest the project is an "LLM Rorschach test" where the human user forces meaning onto random noise generated by the model.

Do I agree? Yes. Reasoning: LLMs operate on statistical probability and vector similarity. They often confuse topical relatedness (these words appear in similar contexts) with causal or logical connection. A connection between "Us/Them" and "fictions" might make sense in a vector space, but to a human reader expecting a philosophical argument, it feels disjointed. Without the reasoning for the link being rigorous, the user has to do the heavy lifting to invent the connection, making the tool less of a "guide" and more of a "random prompt generator.”

2. Outsourcing Critical Thought Criticism: Users eloisius and DrewADesign argue that the project defeats the purpose of reading.[1] They contend that "the thinking is the point," and having a machine find connections robs the reader of the synthesis process that leads to understanding.

Do I agree? Partially. Reasoning: If the goal is learning, they are correct; you cannot learn by having a machine digest information for you. However, if the goal is discovery or research, this criticism is too harsh. Researchers have always used indices, concordances, and bibliographies to find connections they didn't know existed. If this tool is treated as a "super-index" rather than a "replacement reader," it has validity. The danger lies in mistaking the map (the AI's graph) for the territory (the actual knowledge).

3. Hallucinations and Conceptual Errors Criticism: User 8organicbits pointed out a weird label ("Thanos committing fraud" in a section about "useful lies") and questioned the logic of calling a fraud "useful" if the founder is in prison.

Do I agree? Yes. Reasoning: (Note: User Terretta clarified the commenter likely confused the comic villain Thanos with the company Theranos, which was in the text). However, the criticism about the label "useful lies" holds up. The LLM likely grouped "Theranos" under "useful lies" because the deception functioned for a time, but it lacks the nuance to understand that "fraud" and "useful tool" are categorically different to a human moral compass. This highlights the "alien" nature of LLM categorization—it organizes by semantic weight, not human logic.

4. "LLM Slop" and Fatigue Criticism: User typon and others noted the descriptions have a "distinct LLM voice" and dismissed it as "slop."[1] User Aurornis mentioned recognizing the writing style immediately.

Do I agree? Yes. Reasoning: By 2026 (the context of this thread), users are highly attuned to "AI-ese"—the perfectly grammatical but hollow, hedging, and overly enthusiastic tone of standard model outputs. This "slop" creates a trust deficit. When a human reads a connection written by a human, they assume intent. When they read one written by an LLM, they assume hallucination until proven otherwise. This high barrier to trust makes the project harder to enjoy.

Conclusion I agree with the consensus of the thread: Cool tech demo, shallow utility. The project is a great example of what LLMs can do (processing vast amounts of text to find patterns), but it inadvertently highlights what they cannot do (understand the deep, human significance of those patterns). It effectively automates the "what" but misses the "so what?"

djeastm

a month ago

I might be in the minority, but I don't really want to come to HN to see what an LLM has to say about the topic.

Perhaps you might instead provide your own TL;DR after reading it yourself?

jennyholzer6

a month ago

[dead]

durch

a month ago

[flagged]

glemion43

a month ago

I'm carrying a thought around for the last few weeks:

A LLM is a transformer. It transforms a prompt into a result.

Or a human idea into a concrete java implementation.

Currently I'm exploring what unexpected or curious transformations LLMs are capable of but haven't found much yet.

At least I myself was surprised that an LLM can transform a description of something into an IMG by transforming it into a SVG.

durch

a month ago

Format conversions (text → code, description → SVG) are the transformations most reach for first. To me the interesting ones are cognitive: your vague sense → something concrete you can react to → refined understanding. The LLM gives you an artifact to recognize against. That recognition ("yes, more of that" or "no, not quite") is where understanding actually shifts. Each cycle sharpens what you're looking for, a bit like a flywheel, each feeds into the next one.

golemotron

a month ago

That's true, but it can be a trap. I recommend always generating a few alternatives to avoid our bias toward the first generation. When we don't do that we are led rather than leading.

calmoo

a month ago

Ironically your comment is clearly written by an LLM.

durch

a month ago

Ironic indeed: pattern-matching the prose style instead of engaging the idea is exactly the shallow reading the post is about.

calmoo

a month ago

Your original comment is completely void of any substance or originality. Please don't fill the web with robot slop and use your own voice. We both know what you're doing here.

drekipus

a month ago

I dunno, he might have just been reading too much that he really writes like this now. I've seen it happen.

calmoo

a month ago

no, definitely not. It was 100% LLM written. Look at their post history.

sidrag22

a month ago

> gets at something fundamental.

:D

afro88

a month ago

LLMs are generators, and that was the correct way to view them at the start. Agents explore.

durch

a month ago

Generator vs. explorer is a useful distinction, but it's incomplete. Agents without a recognition loop are just generators with extra steps.

What makes exploration valuable is the cycle: act, observe, recognize whether you're closer to what you wanted, then refine. Without that recognition ("closer" or "drifting"), you're exploring blind.

Context is what lets the loop close. You need enough of it to judge the outcome. I think that real shift isn't generators → agents. It's one-shot output → iterative refinement with judgment in the loop.

throwawaySimon

a month ago

Please stop.

durch

a month ago

Something in there you'd like to discuss further, I've been thinking a lot about these ideas ever since LLMs came around, and I think these are many more of these discussion ahead of us...

throwawaySimon

a month ago

Kind of tedious trying to have a discussion with someone who clearly generates their part.

miracoli

a month ago

wow I hope the bubble pops soon.. now that you discovered books with AI that was illegally trained on them, how about reading them?

nephihaha

a month ago

I'm not sure I understand what the connections are exactly, or whether they go much deeper than certain words and phrases.

only-one1701

a month ago

I'm really not trying to be mean, but one of the things we learn in the humanities is that basically any two texts can be connected via extremely broad statements (e.g. "Perfect is the enemy of the good"). This is like the joke on twitter about how every couple of years someone in tech invents the concept of public transportation.

nephihaha

a month ago

Yes, exactly, "extremely broad". English isn't just built up of individual words, but phrasal verbs, idioms and sayings, so it is inevitable some of these will repeat. (Even before AI, hack writing relied on clichés, repetition etc and downright plagiarism.)