Retr0id
2 hours ago
> A 100-bit bloom filter holding 100,000 keys is saturated instantly
> This is the kind of bug you only find by building the thing and measuring it.
No? I mean, maybe if you're vibecoding it's the only way, but in the prehistoric days you could reason about what code would do before you ran it.
bawolff
2 hours ago
Mistakes are always easy to recognize in retrospect, so hopefully this comment isnt too unfair, but one thing that caught me about this, is that logically it makes no sense. You would never use a bloom filter for just 10 entries. If you have only 10 entries it is almost certainly faster to skip the bloom filter. So i feel like that is the part that should have instantly stood out.
[Obviously, i've made my own silly mistakes over the years, many much sillier than this, its just weird to describe this one as only detectable by profiling]
FarmerPotato
an hour ago
Sure, it logically makes no sense. But while learning a new subject, have you never made a silly mistake like:
bool getSchemaSizes(size_t * expectedBatchSize, size_t * expectedEntriesPerBlock) { ... }
size_t expectedEntriesPerBlock, expectedBatchSize;
getSchemaSizes(&expectedEntriesPerBlock, &expectedBatchSize)
initBloomFilter(expectedEntriesPerBlock)
bawolff
an hour ago
I said as much in my comment.
tensegrist
an hour ago
i don't know why you're trying to analyze the meaningfulness of sentences that are not the results of a human thought process but are clearly rhetorical flourishes from an llm that "feels" compelled to fill its prose with them
Retr0id
an hour ago
Comments that explicitly call out an article as slop tend to get downvoted (or disagreed with), it's best to guide the reader towards their own conclusions.
ignoreusernames
an hour ago
Yeah, especially a bloomfilter which has a pretty easy formula for its false positive rate.
jasonwatkinspdx
an hour ago
A lot of people know the basic rule of thumb that a byte per element gives you a bit more than a 1% false positive rate.
But even just thinking about it for half a second from a balls and bins perspective, 100k items into 100 binary bins is obviously gonna saturate.
paulb73
an hour ago
Isn't this what units tests are for?
FarmerPotato
2 hours ago
Do you think the author is somehow capable of writing the entire codebase, but not able to reason about code???
I'm sure you've never made a silly mistake where you passed the wrong integer parameter to a function, stared at your screen, and failed to notice it. Or, forgot the order of arguments to calloc().
If you're saying that profiling is for those too lazy to reason about their code, you're distorting the whole lesson: profiling is more powerful than guessing.
Retr0id
an hour ago
I make all sorts of silly mistakes, but I'd rarely say that running the code is the only way to detect issues.
I also don't think the author wrote much of their codebase, or much of their blog post, but that's the brave new world we're living in.
plorkyeran
36 minutes ago
The author didn't write the blog post so my default assumption is they didn't write the code either.
shermantanktop
an hour ago
I'm called in to consult on a performance problem on a scaled service. Team was load testing their code and seeing low throughput:
Me: so you have an in-memory cache, right?
Them: yes!
Me: what is the TTL?
Them: Oh, it's not set, oops. Here, let's set it to 1 minute. Hey look, the performance went way up!
Me: okay, great. When you say 1 minute, do you mean 60 seconds?
Them: uh...wait...uh....oh, the unit is seconds. Wait, why is the performance so good with a 1 second TTL?
Me: What's your load test?
Them: We crank 1M TPS fetching the same 30 items over and over.
Me: ....
I totally agree about the power of profiling but profiling without understanding would not have helped this team.
FarmerPotato
an hour ago
So the author is doing a self-learning exercise about profiling pre-production code, and you're disagreeing with them by comparing it to a commercial contract. I'm sure you've never, ever made a dumb mistake while getting paid.
shermantanktop
17 minutes ago
I've even made dumb mistakes while NOT getting paid. But even so, I have no idea what you're talking about.
achierius
2 hours ago
No, that's not the point. This isn't a situation where you need to "guess"; bloom filters should be sized according to their capacity. This is akin to having a fixed 10-arg buffer for your program, getting a crash when someone passes 11, and saying "this is the kind of bug you only find by building the thing and measuring it". Yeah it happens and we all make silly mistakes, but it's just not true that this couldn't have been foreseen.
FarmerPotato
an hour ago
Cool! My first downvote!