zellyn
11 days ago
It’s frustratingly difficult to see what these (A2A and MCP) protocols actually look like. All I want is a simple example conversation that includes the actual LLM outputs used to trigger a call and the JSON that goes over the wire… maybe I’ll take some time and make a cheat-sheet.
I have to say, the endorsements at the end somehow made this seem worse…
mlenhard
11 days ago
I was in the same boat in regards to trying to find the actual JSON that was going over the wire. I ended up using Charles to capture all the network requests. I haven't finished the post yet, but if you want to see the actual JSON I have all of the request and responses here https://www.catiemcp.com/blog/mcp-transport-layer/
swyx
11 days ago
itd be nice if you prettified your json in the blogpost
fwiw i thought the message structure was pretty clear on the docs https://modelcontextprotocol.io/docs/concepts/architecture#m...
mlenhard
10 days ago
Yeah, I plan on improving the formatting and adding a few more examples. There were even still some typos in the piece. To be honest, I didn't plan on sharing it yet; I just figured it might be helpful for the OP, so I shared it early.
I also think the docs are pretty good. There's just something about seeing the actual network requests that helps clarify things for me.
nl
10 days ago
Some (many?) people learn better from concrete examples and generalize from them.
TeMPOraL
10 days ago
Not just that, but it's also useful to have examples to validate a) your understanding of the spec, and b) product's actual adherence to the spec.
zellyn
11 days ago
Oh, that's really nice. Did you capture the responses from the LLM. Presumably it has some kind of special syntax in it to initiate a tool call, described in the prompt? Like TOOL_CALL<mcp=github,command=list> or something…
kristopolous
11 days ago
I had never heard of charles ... (https://www.charlesproxy.com/) I basically wrote a simple version of it 20 years ago (https://github.com/kristopolous/proxy) that I use because back then, this didn't exist ... I need to remember to toss my old tools aside
stavros
10 days ago
Well, Charles launched almost 20 years ago, so I'd say there's a good chance that it did exist.
kristopolous
10 days ago
Well hopefully my current thing, a streaming markdown renderer for the terminal (https://github.com/kristopolous/Streamdown) hasn't also been a waste of time
stavros
10 days ago
Why would anything be a waste of time?
kristopolous
10 days ago
I build things I cannot find.
Every project I do is an assertion that I don't believe the thing I make exists.
I have been unable to find a streaming forward only markdown renderer for the terminal nor have I been able to find any suitable library that I could build one with.
So I've taken on the ambitious effort of building my own parser and renderer and go through all the grueling testing that entails
mptest
10 days ago
the answers to that question are hugely variable and depend on the objective and defining waste. if one values learning intrinsically, like most of us here probably do, it is pretty hard to come up with a waste of time, even taking the rare break from learning.
But it seems self-evident where constraints like markets or material conditions might demarcate usefulness and waste.
Even the learners who are as happy to hear about linguistics as they are material science I presume do some opportunity cost analysis as they learn. Personally speaking, I rarely, if ever, feel like I'm wasting time per se but I always recognize and am conscious of the other things I could be doing to better maximize alternative objectives. That omnipresent consciousness may just be anxiety though I guess...
nsonha
10 days ago
either that or "waste of time" is a meaningless phrase
user
10 days ago
mlenhard
11 days ago
Yeah, at its core it's just a proxy, so there are a lot of other tools out there that would do the job. It does have a nice UI and I try to support projects like it when I can.
I'll check out your proxy as well, I enjoy looking at anything built around networking.
Maxious
11 days ago
even the approach that charles takes for intercepting TLS traffic is a bit old school (proxies, fake root certs etc.) - cool kids use eBPF https://mitmproxy.org/posts/local-capture/linux/
stavros
10 days ago
I can see how you don't need a proxy any more, but I don't see how you can bypass TLS without fake root certs, even with eBPF.
Tokumei-no-hito
10 days ago
new to this program as well but looks really nice.
i think it is still a proxy though unless I’m missing something (beyond the name lol).
[here's a section on macos dealing with certs](https://mitmproxy.org/posts/local-capture/macos/)
beaugunderson
10 days ago
here is one example: https://github.com/gojue/ecapture
in short, you can hook calls within SSL libraries (like OpenSSL)
stavros
10 days ago
Sure, but that very much depends on the application, no? What if it's statically linked its SSL lib?
beaugunderson
10 days ago
you wanted to know how people are bypassing the need for a certificate with eBPF, that is how
sunpazed
10 days ago
I had the same frustration and wanted to see "under the hood", so I coded up this little agent tool to play with MCP (sse and stdio), https://github.com/sunpazed/agent-mcp
I really is just json-rpc 2.0 under the hood, either piped to stdio or POSTed over http.
daxfohl
11 days ago
For MCP I found the tutorials at https://github.com/block/goose made it click for me.
jacobs123
11 days ago
It's shown in the link below. It's kind of crazy that they have this huge corporate announcement with 50 logos for something that under the hood seems sort of arbitrary and very fragile, and is probably very sensitive to things like exact word choice and punctuation. There will be effects like bots that say "please" and "thank you" to each other getting measurably better results.
https://google.github.io/A2A/#/documentation?id=multi-turn-c...
TS_Posts
8 days ago
Hi there (I work on a2a) - can you explain the concern a bit more? We'd be happy to look.
A2A is a conduit for agents to speak in their native modalities. From the receiving agent implementation point of view, there shouldn't be a difference in "speaking" to a user/human-in-the-loop and another agent. I'm not aware of anything in the protocol that is sensitive to the content. A2A has 'Messages' and 'Artifacts' to distinguish between generated content and everything else (context, thoughts, user instructions, etc) and should be robust to formatting challenges (since it relies on the underlying agent).
jacobs123
3 days ago
Some of the research I want to show you is, although technically public, very relevant for malware development, especially in worming payloads that are spread by exposed agents to other exposed agents. It's not secret information but I don't want to make it easy for script kiddies to skip 4+ years of studying engineering and the associated learning about ethics. Can I contact you directly in some way? Thanks.
kc10
8 days ago
Can you please expand on this?
The sensitivity to prompts and response quality are related to an agent's functionality, A2A is only addressing the communication aspects between agents and not the content within.
esafak
11 days ago
zellyn
11 days ago
Oh, that's very nice. Thanks!
wongarsu
10 days ago
You weren't kidding with the endorsements. It's endorsed by KPMG, Accenture and BCG. McKinsey and PwC are not in the partner list but are mentioned as contributors. Honorable mention to SAP as another company whose endorsements are a warning sign
ronameles
11 days ago
https://www.youtube.com/watch?v=5_WE6cZeDG8 - I work at an industrial software company. You can kind of think of us as an API layer to factory data, that is generally a mess. This video shows you what MCP can do for us in terms of connecting factory data to LLMS. Maybe it will help. A2A is new to me, and I need to dig in.
Basically if we expose our API over MCP, agents can "figure it out". But MCP isn't secure enough today, so hoping that gets enhanced.
behnamoh
11 days ago
It seems companies figured introducing "protocols" or standards helps their business because if it catches on, it creates a "moat" for them: imagine if A2A became the de facto standard for agent communication. Since Google invented it and already incorporated in their business logic, it would suddenly open up the entire LLM landscape to Google services (so LLMs aren't the end goal here). Microsoft et al. would then either have to introduce their own "standard" or adopt Google's.
mindcrime
11 days ago
or adopt Google's.
Which is an open standard that is Apache licensed[1]. That's no moat for Google. At best it's a drainage ditch.
mycall
11 days ago
It is quite hard to reliably and consistently connect deterministic systems and goals with nondeterministic compute. I don't know if all of this will ever be exactly what we want.
throwaway-blaze
11 days ago
Sort of like asking a non-deterministic human to help make changes to an existing computer system. Extends the problems of human team management to our technology systems.
Xelynega
11 days ago
Not only extends them, but compounds them because you have a non-deterministic human making changes to a non-deterministic computer system which is making changes to an existing computer system.
TeMPOraL
10 days ago
That's basically the problem of employing and managing people.
yurishimo
10 days ago
And look at how much effort our industry goes through as a whole to work around it! Managing people is harder than wrangling machines, even if the upfront cost to "train" and build the machine is multiples higher. Once a deterministic system works, it will keep going until a variable changes. The "problem" with humans is that our variables change like the weather and it takes a lot more effort and resources to keep everyone on track.
"If you just get out of people's way, then they'll do a good job and the right thing!" - yea, perhaps. But how much of "getting out their way" is more a product of providing meaningful ownership and compensation in the workplace? See the paragraph above. Good employees are expensive and as time marches on, their compensation will need to continue to increase at least with inflation, while the machine will likely become cheaper to operate over time as societal advances bring down the cost and complexity of operation.
latentsea
9 days ago
Yup. And this is why I think the "last mile" problem in AI is basically unsolvable.
whalesalad
11 days ago
Agreed. At the end of the day we are talking about RPC. A named method, with known arguments, over the wire. A simple HTTP request comes to mind. But that would just be too easy. Oh wait, that is what all of these are under the hood. We are so cooked.
from fastmcp import FastMCP
mcp = FastMCP("Demo ")
@mcp.tool()
def add(a: int, b: int) -> int:
"""Add two numbers"""
return a + b
This is an example of fastmcp. Notice anything? Replace 2-3 lines of code and this is a Flask or FastAPI application. Why are we not just going all-in on REST/HATEOAS for these things? My only hunch is that either 1. the people designing/proselytizing these "cutting edge" solutions are simply ignorant to how systems communicate and all the existing methods that exist, or 2. they know full well that this is just existing concepts with a new shiny name but don't care because they want to ride the hype train and take advantage of it.pjerem
11 days ago
Ironically, I tried to use the official "github-mcp" and failed to make it work with my company's repos, even with a properly configured token. The thing comes with a full blown server running inside a docker container.
Well, I just told my llm agent to use the `gh` cli instead.
It seems all those new protocols are there to re-invent wheels just to create a new ecosystem of free programs that corporations will be able to use to extract value without writing the safety guards themselves.
config_yml
11 days ago
I feel the same way about OpenAI‘s new responses API. Under the cover of DX they‘re marketing a new default, which is we hold your state and sell it back to you.
whalesalad
11 days ago
OpenAI is tedious to work with. Took me a solid day of fooling around with it before I realized the chat api and the chat completions api are two entirely different apis. Then you have the responses api which is a third thing.
The irony is that gpt4 has no clue which approach is correct. Give it the same prompt three times and you’ll get a solution that uses each of these that has a wildly different footprint, be it via function calls or system prompts, schema or no schema, etc.
lherron
10 days ago
Wait till you deal with google genai lib vs google generativeai lib
peab
11 days ago
Yeah, i haven't seen a reason why we can't just use REST. Like, auth is already figured out. The LLMS already have the knowledge of how to call APIs too!
skeledrew
11 days ago
It's like deciding between Assembly or C for some given project.
nonethewiser
11 days ago
I dont fully understand. The protocol uses HTTP and has a JSON schema. But there are more specifications outside of that. How do you specify those things without a new protocol? Or is the argument that you dont need to specify those things?
Xelynega
11 days ago
REST is a protocol that uses HTTP and a JSON schema.
I fail to see how they're different, they're both "these are the remote procedures you can call on me, and the required parameters, maybe some metadata of the function/parameters".
pests
10 days ago
How are they both describing the remote procedures and parameters tho? In order for the LLM to use a tool it needs to know its name and arguments. There has to be some kind of spec, in some or format, for it to use.
An existing Swagger/OpenAPI spec is not sufficient. You want to limit options and make it easy for an LLM to call your tool to accomplish goals. The complete API surface of your application might not be appropriate. It might be too low level or require too many orchestration steps to do anything useful.
A lot of existing API's require making additional calls using the results of previous calls. GET /users to get a list of ids. Then repeatedly call GET users/$id to get the data. In a MCP world you would provide a get-users tool that would do all this behind the scenes and also impose any privacy/security/auth restrictions before handing this over to an LLM.
We see similar existing systems like GraphQL which provides a fully hydrated resultset in one call. Tons of API's like Stripe (IIRC) that provide a &hydrate= parameter to specific which relations to include full details in-line.
I do agree MCP is overhyped and might not be using best principles but I do see why its going off in its own land. It might be better suited over different protocols or transports or encodings or file formats but it seems to at least work so until something better comes along we are probably stuck with it.
TeMPOraL
10 days ago
> I fail to see how they're different, they're both "these are the remote procedures you can call on me, and the required parameters, maybe some metadata of the function/parameters".
For one, REST is not RPC, despite being commonly confused for it and abused as such. The conceptual models are different. It makes more sense for an action-oriented RPC protocol to be defined as such, instead of a proper REST approach (which is going to be way too verbose), or some bastardized "RESTful" protocol that's just weirdly-structured RPC designed so people can say, "look ma', I'm using HTTP verbs, I'm doing REST".
zellyn
11 days ago
Yeah, I got that from reading the Ghidra MCP (very instructive, strong recommend), but I'm curious what the LLM needs to output to call it. I should go read Goose's code or instrument it or something…
daxfohl
11 days ago
Audio and video streams, two way sync and async communication, raw bytes with meaning, etc. And it's not just remote services, it can be for automating stuff local real-time on your machine, your ide or browser, etc. Like the docs say, MCP is to an AI model as USB is to a CPU.
skeledrew
11 days ago
It's just another layer of abstraction so one doesn't need to think about HTTP at all, which would bring in irrelevant baggage.
qwertox
11 days ago
To be fair, HTTP adds a layer of friendliness over TCP (POST/GET, paths, query parameters) and the servers can be so simple that it can hardly be considered irrelevant baggage.
The benefit it brings is that you can add debugging endpoints which you can use directly in a browser, you get networking with hosts and ports instead of local-only exe + stdio.
skeledrew
11 days ago
That's just one part of it. Keep in mind MCP supports 3 transport methods: stdio, SSE (which would be your HTTP) and websockets. Irrelevant baggage would be having to consider the workings of any of those (given a decently implemented client+server library), rather than merely declaring the servers, tools, resources and prompts to be accessed. There's also a debug mode I believe.
Xelynega
11 days ago
This just furthers my theory that people pushing for MCP don't understand how networking and protocols work.
stdio is a file that your computer can write to and read from
HTTP is a protocol typically used over TCP
websockets is a protocol initiated via HTTP, which again is typically over TCP
Both HTTP and websockets can be done over stdio instead of TCP.
It sounds like MCP has a lot more "irrelevant baggage" I need to learn/consider.
skeledrew
11 days ago
The entire point can be summed in the first 5/6s of that. You don't need to know any of it, because it's irrelevant (at that abstraction). Just as it's irrelevant to know how registers work, to allocate and free memory, avoid/handle segfaults, etc if using a high level language like Python, vs Assembly or C.
Xelynega
10 days ago
That doesn't sound like the case for MCP though. It sounds like when implementing an MCP server there is a difference between the three transport methods that requires different code on the server.
This is a problem solved by other protocols that are just stacked on top of eachother without knowing how eachother work.
skeledrew
9 days ago
That depends on the library implementation. A given library can be anywhere on the spectrum from "user knowledge and management of the transport methods required" to "transport method is determined by protocol format or invocation" (eg. "local://..." vs "remote://...").
whalesalad
11 days ago
but at the end of the day MCP is HTTP lol
laichzeit0
11 days ago
Take a look at the samples: https://google.github.io/A2A/#/documentation
zellyn
11 days ago
Oh, that's really nice. I'd also like to see what syntax the LLM uses to _trigger_ these calls, and what prompt is sent to the LLM to tell it how to do that.
I should probably just go read Goose's code…
laichzeit0
10 days ago
The LLM returns a message called ToolMessage which then describes which function to call and the parameters (you register these functions/tools as part of the initialisation step, like when you pass it temperature, or whatever other options your LLM allows you to set). So think of it as instead of steaming back text, it’s streaming back text to tell you “please call this function with these args” and you can do with that what you want. Ideally you’d call that function and then give the output back to the LLM. Nothing magic really.
medbrane
11 days ago
That's dependent on the particular LLM one uses.
But it can be a json with the tool name and the payload for the tool.
TS_Posts
8 days ago
Hi there! If you load the CLI demo in the github repo (https://github.com/google/A2A/tree/main/samples/python/hosts...) you can see what the A2A servers are returning. Take a look!
ycombinatrix
10 days ago
>the endorsements at the end somehow made this seem worse
holy cow you weren't kidding. legit the last people i would trust with software development.