How we improved GPT-4o multi-step function calling success rate by 4x

13 pointsposted 9 hours ago
by jimminyx

4 Comments

doctorpangloss

7 hours ago

They have identified a big problem with frontier models’ function calling which is that it doesn’t really work with more than 3 functions but:

> instead of allowing the agent complete freedom in choosing from all possible API calls, AGS only presents the contextually relevant options based on where the agent is in its workflow.

Sounds like it will have to be bespoke to each task. Joe Blow enterprise PM farming this out to Jerald Blophus tech agency farming it out to BlowStar Solutions: they’re not going to be able to do this one.

If it doesn’t look like web development, where you have Yavascript and npx create-app something something, you haven’t solved the DX problem either.

It’s hard to find an organic looking conversation that would lead to each permutation and some loops inside of it of your tool calls, regardless of Xpanders method or not. If you don’t test you might as well have an agent that can only call one function. This is one of many reasons that guided OpenAI, I’m sure, to train on just a few functions available to call, and it’s frustrating to read any blog post that doesn’t address “Why don’t the frontier model developers just do this themselves?”

momopoco

7 hours ago

Isn’t this just langgraph?

tlarkworthy

7 hours ago

Isn't this obvious when you work with a stochastic system that giving it tons of wrong moves is gonna increase the failure rate.

petesergeant

7 hours ago

This doesn't appear to be a "how we improved" article, it looks like a press-release for some product