rictic
4 months ago
Hi HN! Didn't expect this to be on the front page today! I should really release all the optimizations that've been landing lately, the version on github is about twice as fast as what's released on npm.
I wrote it when I was doing prototyping on doing streaming rendering of UIs defined by JSON generated by LLMs. Using constrained generation you can essentially hand the model a JSON serializable type, and it will always give you back a value that obeys that type, but the big models are slow enough that incremental rendering makes a big difference in the UX.
I'm pretty proud of the testing that's gone into this project. It's fairly exhaustively tested. If you can find a value that it parses differently than JSON.parse, or a place where it disobeys the 5+1 invariants documented in the README I'd be impressed (and thankful!).
This API, where you get a series of partial values, is designed to be easy to render with any of the `UI = f(state)` libraries like React or Lit, though you may need to short circuit some memoization or early exiting since whenever possible jsonriver will mutate existing values rather than creating new ones.
rictic
4 months ago
I've just published v1.0.1. It's about 2x faster, and should have no other observable changes. The speedup is mainly from avoiding allocation and string slicing as much as possible, plus an internal refactor to bind the parser and tokenizer more tightly together.
Previously the parser would get an array of tokens each time it pushed data into the tokenizer. This was easy to write, but it meant we needed to allocate token objects. Now the tokenizer has a reference to the parser and calls token-specific methods directly on it. Since most of the tokens carry no data, this keeps us from jumping all over the heap so much. If we were parsing a more complicated language this might become a huge pain in the butt, but JSON is simple enough, and the test suite is exhaustive enough, that we can afford a little nightmare spaghetti if it improves on speed.
Inviz
4 months ago
I want to ditch stream-json so hard (needs polyfills in browser, cumbersome to use), but I need only one feature: invoke callback by path (e.g. `user.posts` need to invoke for each post in array) only for complete objects. Is this something that json river can support?
rictic
4 months ago
jsonriver's invariants do give you enough info to notice which values are and aren't complete. They also mean that you can mutate the objects and arrays it returns to drop data that you don't care about.
There might be room for some helper functions in something like a 'jsonriver/helpers.js' module. I'll poke around at it.
Inviz
4 months ago
Please consider it a feature request
rictic
4 months ago
For anyone else following along, see https://github.com/rictic/jsonriver/issues/39
stevage
4 months ago
Suggestion: make it clearer in the readme what happens with malformed input.
I can imagine it being useful to have a made where you never emit strings until they are final, also. I don't entirely understand why strings are emitted incrementally but numbers aren't.
xp84
4 months ago
Seems useful to me in the context of something like a progressively rendered UI. A large block of text appearing a few characters at a time would be fine, but a number that represents something like a display metric (say, a position, or font-size) going from 0 to 0.5 or from 1 to 1000, would result in goofy gyrations on-screen that don't make any sense. Or imagine if it was just fields in the app's data.
Name: John Smith. Birth Year: A.D. 1 [Customer is a Senior: 2,024 years old]
Name: John Smith. Birth year: A.D. 19 [Customer is a Senior: 2,006 years old]
Name: John Smith. Birth year: A.D. 199 [Customer is a Senior: 1,826 years old]
Name: John Smith. Birth year: 1997
tags2k
4 months ago
If you're updating the UI every time you receive a single character from this library, you've got bigger problems than font size.
xp84
4 months ago
Isn't that one of the main points of React and its ilk? The state is just a big JSON object, and sometimes you might be fetching a bunch of data that makes up that state, and streaming it in. If latency is high and volume of data is high, seems perfectly reasonable to get the UI rendering as the state comes in instead of waiting for the last byte to do anything.
For instance, imagine you don't fully control the backend to split up a large response into several smaller API calls, but you could render the top part of the UI, which may be the most useful part, from the first couple of keys in the JSON, while a large "transaction history" after that is still downloading.
spankalee
4 months ago
If your UI layer can't efficiently update when you get new characters, you've got bigger problems than JSON parsing.
Seriously, you should be able to update the UI with a new character, and much more, at 60fps easily.
sysguest
4 months ago
hmm this makes sense for LLM usage
(but for other uses - nope)
rictic
4 months ago
Good feedback! Just updated the README with the following:
> The parse function also matches JSON.parse's behavior for invalid input. If the input stream cannot be parsed as the start of a valid JSON document, then parsing halts and an error is thrown. More precisely, the promise returned by the next method on the AsyncIterable rejects with an Error. Likewise if the input stream closes prematurely.
As for why strings are emitted incrementally, it's just that I was often dealing with long strings produced slowly by LLMs. JSON encoded numbers can be big in theory, but there's no practical reason to do so as almost everyone decodes them as 64bit floats.