The missed opportunity of constrained decoding

2 pointsposted 11 hours ago
by killcoder

4 Comments

Agent_Builder

11 hours ago

This resonates. What we saw in practice is that most failures don’t come from models being too dumb, but from being given too much freedom.

While using GTWY.ai, the biggest reduction in hallucinations came from constraining what an agent was allowed to do at each step, not from better prompts or verification layers.

Once inputs, tools, and outputs were explicit, the model stopped confidently inventing things. It felt less “creative”, but far more useful.

Fewer degrees of freedom beat smarter models, at least in production.

killcoder

10 hours ago

I don't buy the "any constraints cause lower performance via being out of distribution" idea. Sure if you ask the model to output 'reasoning' in JSON steps, that is a completely different 'channel' to its trained 'reasoning' output. For real tasks though, I think it's more about picking the _right_ context free grammar to enforce format correctness. You can enforce an in-distribution format and get the best of both worlds. I don't think the industry should settle so hard on JSON-for-everything.

Agent_Builder

10 hours ago

I think we’re mostly aligned. The constraints we’re talking about weren’t about forcing everything into JSON or limiting reasoning bandwidth.

Inside a step, the model still reasons freely in plain language. The constraint is on what authority exists at that step.

The failures we saw came from permissions and assumptions silently carrying over between steps, not from the model “thinking wrong”. Once a step ended, any authority it had ended too.

So it’s less “constrain decoding” and more “constrain capability scope over time”. Free reasoning within a step, hard boundaries between steps.

That separation is what removed a lot of surprising behavior for us.

killcoder

11 hours ago

I was working on a speculative decoding optimisation and its accompanying blog post. Explaining the more basic concepts filled so much of the post I decided to pull them out, forming this article.

I had a bit too much fun with the tokenisation diagrams / animations. The raw text is provided to an Astro component, which tokenises it, and forms the individual DOM elements of the tokens. I find it really hard to read 'tokenised' text, I figured some consistent colouring would help. The 'Probabilities' component is a trivial grid, but all the other components support 'word wrap'.

I ended up writing a 'responsive design aware graph colouring solver'.

Multiple screen widths, 'desktop' and 'mobile' are 'simulated', forming an adjacency graph of tokens that touch. Colours are then greedily allocated, then optimised per page over a few hundred iterations, swapping allocations to enforce minimum hue distance between touching tokens at those common screen sizes. The optimising value function prioritises even distribution of colours, because it looks nicer than maximal hue difference.

Originally I naively outputted the palette styles per component, but found the css post processing optimisers didn't handle that as well as I'd have thought. So then I wrote a little 'CSS compiler' that takes the high level palette and timing concepts of the animations, and optimally merges rule declarations.

The start of the post really relies on the animation occurring while fully in view, so I set up some IntersectionObservers that do the 'please scroll' text.

I tried my best to have it all work when JS is disabled on the client. I tried to get the 'hovering' to be CSS-only, but found the JS solution much more performant.

The DAG diagrams are formed with this neat Needleman-Wunsch algorithm from the bioinformatics field. The Astro component accepts several 'examples' then aligns common subsequences, producing the CSS grid and the 'basic SVG' on the server. The responsive nature meant I had to move the final 'allow' generation to the client.

Some browsers seem to throttle the token animations sometimes but I haven't figured out what causes that. This is my first time leaning hard on CSS variables.