Show HN: I mapped HN's favorite books with GPT-4o

274 pointsposted 4 days ago
by pmaze

59 Comments

peteforde

2 days ago

Really cool to see my favs show up, but I honestly don't understand what we're actually looking at; the groupings seem very opaque beyond very general themes like sci-fi, startups, biographies, math, physics.

In other words, what are the clustering shapes telling us? Can we dig in based on geography, publishing date, key terms or themes?

Either way, I can't keep the site open for more than 30-40 seconds before it crashes. I suspect that's not the goal!

Is Cryptonomicon the best fiction book, or is the data wrong?

pmaze

2 days ago

The crash was indeed not intended - my mistake! Should be fixed now.

You've got the cluster semantics spot on, to be honest. Broad genres are grouped together, with a tendency for sub-genres to be grouped locally within those.

There is no interpretation of the overall shapes or the global structure, those are more a result of a particular UMAP run than inherent in the data.

Would love to provide different views on it and go more in depth next, thanks for the suggestion.

peteforde

a day ago

IMO, evolution over time is a great place to start.

jdthedisciple

2 days ago

> Either way, I can't keep the site open for more than 30-40 seconds before it crashes.

Yup, probably was about to happen to me too, had I not closed it.

CPU fan almost launched off the troposphere about 30 seconds in.

Probably a cluttered bunch of heavily unoptimized ReactJS modules in there (no offense to OP, I know it probably sped up development by 10x at least)

zamber

2 days ago

Nope, hug of death is seems:

Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/html". Strict MIME type checking is enforced for module scripts per HTML spec.

Ad infinitum for a list of a couple .js files with repeating names.

Guess we'll have to come back in a day or two to experience it in it's full glory :).

pmaze

2 days ago

Hey, thanks for reporting - this is fixed now. I messed up the static build and some browsers freaked out. By law of showing things publicly, I of course only tested in a browser that didn't. Hope you can give it another chance!

refulgentis

2 days ago

There's a sort of regular repeating confusion with embeddings that they're very well behaved in visual dimensions.

IMHO it's a category error that results from tutorials using the king + female = queen example (which, funnily enough, wasn't even true for the original word2vec, if commentary I've read previously here is correct).

Working with them a lot has me picture them more as "a multivariate function that outputs 768 numbers, and was learned by brute force" than "something that sees in 768 dimensions" --- of course, they're both true, but the second interpretation shades more than it illuminates once you're past the very first interrogatory of "so what is this calculating, exactly?"

nostrebored

2 days ago

How behaved they are visually depends on what drives variance and what you’re hoping to see. There are certainly some nice properties in some dimensionality reductions, but if you flatten a space of faces it’s less likely that you’ll get the property of “brown hair” as a query embedded in any visually interesting way than actually putting in a face as a query.

More clearly, symmetric retrieval is easier to visualize in a dimensionality reduced space than asymmetric retrieval.

I suspect that some form of multi vector document embedding would be more understandable in the reduced space than this single vector representation.

dcchambers

4 hours ago

This is great, thanks for sharing! Seems like the perfect fun little project that before LLMs could take some serious data work, but LLMs make pretty trivial to process and create.

padolsey

2 days ago

Niiice! I really like it. The spatial approach is cool, though labelling/annotations/axes would help.

I share the frustraion with getting book covers for my project ablf.io. Amazon used to make this much easier, but they've locked it down recently, so you have to jump through affiliate hoops. I ended up implementing my own thing and storing thousands of images myself on S3. If you have the goodreads IDs, feel free to use:

    assets.abooklike.foo/covers/{goodreads id}.jpg
N.B. The actual goodreads website itself make it hard as well since they have an additional UUID in their img URIs, so it's not deterministic; that's why I created this.

DantesKite

2 days ago

That’s a great website. I’ve been looking for alternative book recommendation websites for a while and it really has nailed it down.

It even recommended me a somewhat eclectic book I’ve recently been meaning to read.

Is there a reason you limit to only 6 favorite books? Is it due to computational restraints?

renjimen

2 days ago

Nice site! I like that I can filter results by fiction or non-fiction. Interesting to enter my favourite novels and see the non-fiction that's recommended. Some surprisingly good picks!

alabhyajindal

2 days ago

Congrats! The interface is beautiful and fast!

Adding direct links to the comments that mention the books could be a good feature to add. Hacker News Books [1] does this and it's useful have all the comments for a book in a single page.

1. https://hackernewsbooks.com

r_singh

2 days ago

Really cool! Never thought "I Am That" based on the conversations with Nisargdatta Maharaj would show up here. That's the beauty of HN. You never know what you're gonna get :)

Tastefully made. I'm gonna go over it in my leisure time.

About your question for a reliable source to get book covers. I run this api that could possibly do this if you collect the Amazon asin numbers (or urls) for the books (that can also be done with the search api I host): https://docs.unwrangle.com/amazon-product-data-api/

If it seems useful, you can reach out to me and mention this chat. I'll be happy to offer free credits for your project.

mooreed

2 days ago

Nice project.

I also would love to hear more about the cluster shapes and cardinality of the coordinate system. I consider myself am pretty versed in data analysis, however with less expertise on NLP topics (eg t-SNE).

So a quick blurb like: the units on the axes in the graph are “a reduced embedding space” designed to keep structure and to reduce the dimensionality such that the clusters could be plotted on screen…

(I’m not even sure that’s correct, but I would have loved for you to have informed me on the one sentence visualization choice and then point me to t-SNE.)

Overall nice project - and it reminds me of a painful professional analysis lesson I have had to re-learn more than once.

> After working for NN hours on an analysis, and finally breaking through and completing it, overlooking the title and labels is the biggest footgun I have ever dealt with.

sleazebreeze

2 days ago

The aesthetics are nice, but what I really want is a toggleable overlay that shows the rough keyword mapping for all the books. The single book view is fine for understanding a single book, but not useful for trying to process the whole page to find one book I might want to read.

Nice project though, I love it.

dangus

2 days ago

I find the graphical nature of this to be disorganized and distracting. If you didn’t explain to me what the meaning of the map was it would be essentially a meaningless cluster of book covers.

jppope

2 days ago

FYI I have a runaway recursive processing when I load the site... it goes down in ~30 seconds or so.

changexd

2 days ago

Appreciate the work! I didn't find the value of reading until I broke up with my ex which had me rethinking about my whole life and value, that's how I found the value of reading, right now I read when commuting and sometimes I just don't know what to read next, this is a good place to find some good books, thanks!

ilikehurdles

2 days ago

One small issue on mobile safari. when i tap to drag the map around, if i put my finger down on a book cover to start dragging the map, the book description is immediately expanded. put differently, my intention is to drag not open, but both actions take place when I drag.

I really like the project otherwise. We have a book club that’s deciding on what to read next and this could be very helpful.

namanyayg

4 days ago

nice project, pieterma.

i'm curious about the decision to use hellinger distance for the second round of UMAP - was that purely empirical or did you have some intuition about why it'd work well for this specific dataset?

also, out of curiosity, what's the most popular book on the map that doesn't have a clear genre cluster?

pmaze

3 days ago

Thanks!

The cluster memberships that come out of the first round are distributions over the different clusters, e.g. a given book is weighted 0.8 for cluster A and 0.2 for cluster B. The Hellinger distance is well-suited to quantify the difference between two distributions like that. Cosine similarity and Euclidean distance worked as well, but Hellinger gave subjectively nicer results.

Very interesting question, I'm not sure! While developing, I noticed that the systems thinking books were spread over different genres, which I found quite pleasing. However, I'm not sure if other books were even more diffuse. I'll have to dig back in and find out :)

noitpmeder

2 days ago

This is awesome. Glad to see both A Fire Upon the Deep and Deepness made it on the list!

answerheck

2 days ago

Lovely work, thank you for sharing

Probably a comment on my subconscious desire for familiarity/patterns, but the left side of the map instantly made me think of NW Europe: long skinny Norway dangling between the UK and Denmark (not correctly spaced, but sizes are reasonably correct!). A few other candidates at a stretch - maybe some Baltic states off to the east, for example - but after that it breaks down unfortunately.

Cool project sir

Nathanael_M

2 days ago

I'd like to explore this more, but I'm getting THOUSANDS of errors:

Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/html". Strict MIME type checking is enforced for module scripts per HTML spec.

This crashes my browser in less than a minute.

pmaze

2 days ago

My apologies for that! First time deploying Svelte Kit to Cloudflare Pages, and I messed up the static build. Should be fixed now, hope you can give it another shot.

lucius_verus

a day ago

Should we be concerned that Mein Kampf shows up in a list of HN "favorite books"?

ijidak

2 days ago

Love this. I like that the clustering allows me to start from a book I've read and liked, and then move on the similar books in the cluster.

For example, I just finished The Phoenix Project.

I'm already seeing some related books I should take a look at.

Very useful!

motohagiography

2 days ago

amazing and hilariously accurate. together they represent a culture and shared ontology. having worked in places with others who have read some of these books, the shorthand is super fast.

23B1

2 days ago

This is cool.

Idea: Amazon has killed 'random' browsing of books. Would love to see this applied to topic area searches etc. so I can have the same serendipity that I used to get in all the bookstores Amazon unalived.

Eduard

2 days ago

For my system: becomes unresponsive within the first second.

* Google Chrome form flathub. Version 128.0.6613.119 (Official Build) (64-bit) * Debian 12 bookworm under KDE Wayland

reducesuffering

2 days ago

Any way to determine the quantity of recommendations or degree of positive sentiment? Maybe a larger book cover image?

bestinterest

2 days ago

Nice, super responsive - what's this using underneath? I see canvas + svelte

SoftTalker

2 days ago

Interesting but the visualization is useless. How about a standard tabular format maybe grouped by genre?

allenu

2 days ago

I think it's useful as a tool for browsing a getting a general gist of what people are into and seeing if your favorites are there, too. As for a tool to maximize one's reading list, certainly not as useful, but I appreciate that it didn't make me feel like I had to create action items on things to read.

ok123456

2 days ago

The 'genres' emerge out of the clusters. The fact that you can pick out 'genres' from this plot is an example of semi-supervised learning.

SoftTalker

2 days ago

OK but there's no clue what they are until you start poking around. It's not something that's useful without some substantial investment of time and exploration. If that's the goal, fine but it's not how I would present a "favorite books" list.

ok123456

2 days ago

Making 'genres' and classifying things is imprecise and requires experts in subject matter and library science to get it right. The "genre" labels here emerge out of the data itself.

maCDzP

2 days ago

Did you reduce the dimension before applying HDBSCAN?

pmaze

2 days ago

I did, there was a first round of UMAP to 50 dimensions. Running HDBSCAN on the full embeddings gave bad results, lots of singleton clusters.

goshx

2 days ago

Very nice. Do you have a text format of this available?

the__alchemist

2 days ago

Well, now I have recommendations; ty! I just bought 9 books from this list, filling in around ones I know I like. This is outstanding.

kthartic

2 days ago

I'm not sure I understand the "map" part of this. What does the geography represent exactly?

vegabook

2 days ago

looks like t-SNE projection

WillAdams

2 days ago

What does each axis represent?

What is the significance of the placement of each cluster?

ok123456

2 days ago

In t-SNE, the distances in the feature vector space are preserved in the projected space. IIRC, these distances serve as boundary conditions to a stochastic diffusion problem. The actual positions and the orientation are allowed to be free variables.

dnlserrano

2 days ago

this is great, and cool looking, thanks!