hackernews client

Show HN: ISON – Data format that uses 30-70% fewer tokens than JSON for LLMs

7 pointsposted a month ago

14 Comments

maheshvaikri99

a month ago

Token Efficiency

  | Format       | Tokens | vs JSON  |
  |--------------|--------|----------|
  | ISONGraph    | 639    | -69%     |
  | ISON         | 685    | -66%     |
  | TOON         | 856    | -58%     |
  | JSON Compact | 1,072  | -47%     |
  | JSON         | 2,039  | baseline |

  LLM Accuracy

  | Format       | Correct | Accuracy | Acc/1K Tokens |
  |--------------|---------|----------|---------------|
  | ISONGraph    | 46/50   | 92.0%    | 143.97        |
  | ISON         | 44/50   | 88.0%    | 128.47        |
  | JSON         | 42/50   | 84.0%    | 41.20         |
  | JSON Compact | 41/50   | 82.0%    | 76.49         |
  | TOON         | 40/50   | 80.0%    | 93.46         |

  Key Findings

  1. ISONGraph wins on both efficiency AND accuracy - 92% correct with fewest tokens
  2. ISON/ISONGraph excel at multi-hop queries - LLM can follow relationships easily
  3. Acc/1K metric shows ISONGraph provides 3.5x more value per token than JSON
  4. Graph-specific format helps LLM understand relationships better than flat JSON

maheshvaikri99

a month ago

Published the Benchmark with Results

https://ison.dev/benchmark.html

https://github.com/maheshvaikri-code/ison/tree/main/benchmar...

Personally, I'm against anything that goes against the standard LLM data formats of JSON and MD. Any perceived economy is outweighed by confusion when none of these alternative formats exist in the training data in any real sense and every one of them has to be translated (by the LLM) to be used in your code or to apply to your real data.

Any tokens you saved will be lost 3x over in that process, as well as introducing confusing new context information that's unrelated to your app.

maheshvaikri99

a month ago

Fair point, but I'd push back on "none of these alternative formats exist in training data."

ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format:

table.users id name email 1 Alice alice@example.com

...is structurally identical to markdown tables and CSVs that dominate training corpora.

On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing.

If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds.

The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify.

dtagames

a month ago

That's exactly the problem. Why convert anything, especially if it's as lossy as CSVs are? You lose nesting and the rest of your structure in favor of a single header row. That's not a benefit.

If your real data is in JSON (and in JS/TS apps, it always is at runtime as only JSON objects exist in that language) it makes no sense to ever convert it, period.

Besides, corporate report type CSVs that are in training materials don't have data shapes anything like JSON or even most businesses software. You're crippling an established and useful data carrier in order to save pennies on tokens. Tokens are getting cheaper, so it's the wrong optimization.

maheshvaikri99

a month ago

Fair enough. Let me clarify the use case:

ISON isn't meant to replace JSON in your application. Your JS/TS code still uses JSON objects internally. ISON is specifically for the LLM context window.

The flow: App (JSON) → serialize to ISON → inject into prompt → LLM reasons → response → your app

You're right that nesting is lost. But for LLM reasoning, flat structures often work better. LLMs struggle with deeply nested JSON - they lose track of parent-child relationships 4+ levels deep.

On "tokens are getting cheaper": True for API costs. But context windows are still limited. When you're stuffing RAG results, memory, agent state, and user history into 128K tokens, every byte matters. It's not about saving money - it's about fitting more context.

On "wrong optimization": I ran the benchmark. Same data, same task. ISON: 88.3% accuracy. JSON: 84.7%. The LLM actually performed better with the tabular format, not just "equivalent for fewer tokens."

## BENCHMARK STATS:

TOKEN EFFICIENCY: ISON: 3,550 tokens JSON: 12,668 tokens

  ISON vs JSON:        72.0% reduction

LLM ACCURACY (300 Questions): ISON: 265/300 ( 88.3%) JSON: 254/300 ( 84.7%)

EFFICIENCY (Acc/1K): ISON: 24.88 JSON: 6.68 ISON is 272.3% MORE EFFICIENT than JSON!

But I hear you - if your data is deeply nested and that nesting carries semantic meaning the LLM needs, JSON might be the right choice. ISON works best for relational/tabular data going into context.

quinncom

a month ago

When including ISON alongside normal text, which language should you use for the code fence info string? Is `ison` a known code type, i.e.:

```ison object.config timeout 30 debug true api_key "sk-xxx-secret" max_retries 3 ```

dmarwicke

a month ago

tried this with msgpack last year. accuracy tanked. models have seen a trillion json examples, like 12 of whatever format you invent

dClauzel

a month ago

Just use CSV at this point :D

maheshvaikri99

a month ago

Ha, fair. CSV gets you 80% there.

The 20% ISON adds: - Multiple named tables in one doc - Cross-table references - No escaping hell (quoted strings handled cleanly) - Schema validation (ISONantic)

If you're stuffing one flat table into context, CSV works fine. When you have users + orders + products with relationships, ISON saves you from JSON's bracket tax.

throw03172019

a month ago

So CSV with a “typed” header?

maheshvaikri99

a month ago

Essentially yes, but with a few additions CSV lacks:

1. Multiple tables in one document (table.users, table.orders) 2. References between tables (:user:42 links to id 42) 3. Object blocks for config/metadata 4. Streaming format (ISONL) for large datasets

The type annotations are optional - they help LLMs understand the schema without inference.

You could think of it as "CSV that knows about relationships" - which is exactly what multi-agent systems need when passing state around.

throw03172019

a month ago

Got it. Thanks.

Any data on how LLMs like this format? Are they able to make the associations etc?

maheshvaikri99

a month ago

Yes - I ran a 300 Questions benchmark comparing ISON vs JSON vs JSON-COMPACT etc on the same tasks.

ISON: 88.3% accuracy JSON: lower (can share exact numbers if interested)

Tested across Claude, GPT-4, DeepSeek, and Llama 3.

The key finding: LLMs handle tabular formats natively because they've seen billions of markdown tables and CSVs in training. No special prompting needed.

For associations, I tested with multi-table ISON docs like:

table.users id name 1 Alice 2 Bob

table.orders id user_id product 101 :1 Widget 102 :2 Gadget

Prompt: "What did Alice order?"

All models correctly resolved :1 → Alice → Widget without explicit instructions about the reference syntax.

The 30-70% token savings come from removing JSON's structural overhead (braces, quotes, colons, commas) while keeping the same semantic density.

Haven't published formal benchmarks on this yet - that's good feedback. I should.