hackernews client

sqlite-utils 4.0rc2, mostly written by Claude Fable (for about $149.25)

44 pointsposted 4 hours ago

by ognyankulev

(simonwillison.net)

50 Comments

utopiah

2 minutes ago

Great to see such write ups in particular with money.

At least 2 things the random LinkedIn post will ignore, on purpose or not :

- price today remains low (even though they might feel higher than before), Uber is the business model, no secret there, it's a VC classic

- $150 spent by an expert, a software engineer with significant practical knowledge in AI, is not equivalent to the exact same amount spent by a novice.

Yet now that a number is out, you bet it will be used. Expect alarmist posts tomorrow morning in your feed claiming building software is now as cheap as diner at the restaurant.

I'm kind of surprised that there is no test case that would have identified the fact that delete_where() leaves the state corrupted. There would be no need to ask Fable if the problem gets identified by the test. And having a test will also catch all future problems that might arise in the same function. So maybe instead of asking Claude what is wrong it would be wiser to invest in test coverage.

dreadnip

2 hours ago

The problem I have with this workflow is that the models are still too eager to please. If I ask it to scan a release and note possible issues, it absolutely will find issues. If I keep running the same prompt, it will keep finding issues. I’ve spammed GitHub PR reviews and it just keep finding (or inventing?) new issues. There is never a “Nothing found, good to go!”. I have to keep reminding myself that the model will always give me what I ask for, regardless of the reality/truth.

KronisLV

13 minutes ago

> There is never a “Nothing found, good to go!”. I have to keep reminding myself that the model will always give me what I ask for, regardless of the reality/truth.

Tell it something like:

  Before doing any commits or producing a summary for the user, you must run a verification sub-agent.
  Its goal is to adversarially and critically check your supposed findings to look out for false positives and hallucinations.
  Doing so with a separate sub-agent with relatively clean context (but with all the relevant details of the problem space that appear to be facts) should improve our confidence in the findings.

Maybe also something like:

    Try to classify each found issue as either SERIOUS, CRITICAL or NITPICK, discard nitpicks, we only care about impactful issues.

It should somewhat cut down on the useless output.

I've largely found the same in regards to generating code - the initial pass will often have bugs that the model itself can find but only when run as a separate sub-agent without the confidence poisoning in its own previous output.

JodieBenitez

10 minutes ago

> There is never a “Nothing found, good to go!”

Not entirely true IME. Eventually the bug hunt will end with general design advices that may not be suitable to your use case and that you can skip.

baq

2 hours ago

You didn’t do it enough. They stop finding bugs eventually. Also, different models can find different bugs (though they do find the same ones, too, which is good and expected). For best results you want to run multi model reviews in loops.

If you had multiple people look at your PRs multiple times on different days results would be very similar.

PunchyHamster

an hour ago

I've had it find bug, I asked it to make test to trigger the bug, and then it figured out it's not a bug. It will absolutely do wish fulfilment

left-struck

an hour ago

Yeah when these models find a bug i like to ask it to write a test that will fail if the bug is real and pass when the bug is solved.

It’s not perfect but usually it works pretty well, and I’ve had the model come back to me with oh actually the test passed, the bug doesn’t work exist

As a bonus, you’ve now got a test that can detect that bug if it comes up again.

csomar

an hour ago

It'll find a non-existent bug - fix it - figure out it broke a previously working thing - try to fix again - etc..

The "keep improving" the code base prompt have been tried and it never works. The LLM has no consciousness of where to stop and where to draw the lines of reasonableness.

MallocVoidstar

2 hours ago

No, depending on the complexity of the issue models can be into loops, where they go "this is definitely an issue and must be fixed", and then the resulting fixed code gets "this is definitely an issue and must be fixed", and then the resulting fixed code has the original 'issue'.

bfjvibybd6cuvu6

an hour ago

That's a different kind of loop.

For a normal review loops you can ask the model to return with nothing found if nothing is found and not invent things and it will do a better job of exiting without anything found.

memoriyato3

an hour ago

yeah, happened to me: "A is very wrong, you should do B", and on the next fresh review loop "B is very wrong, you should do A"

typically this means there is some ambiguity in the specification, and the model flips between alternative interpretations

bluenose69

39 minutes ago

I get this sometimes when I ask the agent on GitHub to suggestion improvements to my Julia code. It's kind of fun to watch it struggle to please. I'm reminded of the old "Doctor" mode in Emacs.

imhoguy

an hour ago

You need to create review skill and there define what "issue" or "good" are for you to limit sensitiviness. Otherwise you depend on model's random threshold or non of such then you get perfection chasing.

Anyway it will never match your judgemend completely unless you upload your brain dump into model.

mejutoco

23 minutes ago

You could ask the model to say "nothing found" if the improvement was stylistic, or other constraints.

embedding-shape

2 hours ago

> There is never a “Nothing found, good to go!”.

Like when you do recursive programming, have you tried providing more/better stop conditions? If you literally just say "Continue until there are no more issues" then it'll do just that, but if you scope it better, like "Only mention issues related to X, Y or that leads to Z" and so on, you'll get less noise and more focus on issues that actually matter (to you).

memoriyato3

an hour ago

also helps adding negative conditions like "do not nitpick", or specific bad attractors that you see "do not investigate/report anything related to symlinks, they are not a concern"

onion2k

2 hours ago

If I keep running the same prompt, it will keep finding issues.

gib444

an hour ago

It's not eagerness to please (that's anthropomorphising), rather it's a desire to bill you more money/use more tokens

(The fixed prices are just temporary discounts)

jph00

41 minutes ago

I'm a big fan of sqlite-utils, but I really don't like how Python (particularly 3.12+) changes how sqlite's transactions work -- the native behavior explained in the sqlite docs is much better IMO. I understand why Python had to change it (to be compatible with other databases) but I don't think it's a good model for sqlite.

Therefore, I created apsw-utils, a port of sqlite-utils to the amazingly-awesome apsw lib -- which is a really idiomatic sqlite lib for python. It's here: https://answerdotai.github.io/apswutils/

I've used it in lots of projects including in significant production stuff, and it's always worked great for me. IMO if you're serious about doing sqlite in python, at some point you'll probably want to check out apsw.

jmalicki

10 minutes ago

> changes how sqlite's transactions work

What specifically are you referring to? The apswutils website also does not explain.

5701652400

an hour ago

just a note. in most parts of the world 149.25 USD can cover utilities, water, and food for a month for 1 adult person or even a family.

klustregrif

44 minutes ago

Had this been a corporate environment the net saving by using one person partly and an agent as opposed to one person full time for the time it would take to implement this, would be a net saving enough to cover utilities, water and food for an entire village.

It’s silly to act like this was an added cost in a vacuum, or that any costs translate directly into charity for arbitrary families. Also in some place it would even cover rent for half a day.

xyzzy123

an hour ago

In Sydney Australia its < 2 days of median rent.

Muromec

an hour ago

That's my electricity bill for a year, okay

mirekrusin

an hour ago

In others it's pizza night for family or half a bill for sushi dinner, so what?

Tiberium

2 hours ago

The title cost is only if this was raw API usage, but it was included in a subscription, so it's a small subset of the $200 plan:

> I upgraded to the Claude Max $200/month plan (I was previously on $100/month) to increase my Fable allowance for the remaining time until the July 7th Fablepocalypse, when even Claude Max subscribers will have to pay full API cost for the model.

I really wonder if Anthropic will stick with their decision to keep Fable on extra usage credits until they "get more compute", especially in the light of GPT 5.6 very likely coming out next week (it's confirmed to have the exact same pricing as GPT 5.5)

embedding-shape

2 hours ago

> especially in the light of GPT 5.6 very likely coming out next week

Finally have an explanation why GPT 5.5 xhigh felt dumber and dumber these last few weeks, always the same thing when a new model release is about to come out...

32 minutes ago

Yes, and we can see A/B testing on the ChatGPT website all the time.

vasco

2 hours ago

And nothing happened and zero people got in trouble over it.

- Narrator

Muromec

an hour ago

...So far