100ms
a day ago
Including a strong motivating example might have helped sell this, using an example that could trivially be expressed as a GET is extremely distracting.
Even imagining a QUERY with a large JSON filtering structure, or say an image input as request body, it feels extremely odd to include the request body as part of the cache key. It also implies an unbounded and user-controlled cache key, with the only really meaningful general caching strategy being bitwise compare of the request body (or a hash), which in a hostile scenario implies cache busting would be trivial.
This invokes multiple semantic oddities in one go with obvious difficulties for a very niche use case. If I'm writing a service that needs complex filtering or complex input like an image, any form of caching (e.g. individual data columns of a join, or embeddings keyed by perceptual hashes of a decoded image input) is going to be far away from the HTTP layer and certainly unrelated to the exact bit representation of the request on the wire.
Why even bother trying to capture this in a generic way?
I would be far more inclined to try and capture this caching semantic as a new header for POST. Something like "Vary: request-body" or similar. Perfectly backwards compatible and perfectly ignorable for all but the 0.1% of CDN use cases where the behaviour might turn out useful
Joker_vD
a day ago
> It also implies an unbounded and user-controlled cache key,
The query part of GET's URI is also barely bounded in practice and user-controlled, and is indeed used as part of the cache key (because it's a part of URI), so I am not sure why you raise this objection at all.
giancarlostoro
a day ago
> and user-controlled
I've found some sites that tack on a session ID and if you try to tamper with the URL in any way, it sends you back to "Page 1" really annoys me lol at that point let me skip to any page with your web UI.
PunchyHamster
a day ago
Well, because it is more code. Current caching software caches by headers + query string. It now needs to be expaned to cache by body too.
It feels very pointless and there is no drawback of just using POST
OvervCW
a day ago
There is: your browser or other type of client does not know it can repeat a POST request if it fails, whereas a QUERY request can be freely repeated in case of errors.
spockz
20 hours ago
Not freely. It is idempotent, not safe. So it still can have serious load consequences.
Joker_vD
20 hours ago
Unlike POST, however, the method is explicitly safe and idempotent, allowing
functions like caching and automatic retries to operate.account42
7 hours ago
Putting something in a spec does not automatically make it true. In the real world if you repeat expensive queries more than an undefined amount you get blocked or at least bot-checked.
afavour
a day ago
Is caching not the primary reason to use this over POST? You should never want to cache POST requests.
drdexebtjl
21 hours ago
No. Being idempotent, it also lets the browser/client/reverse proxy retry it if it fails.
nfw2
19 hours ago
Technically a put or a patch is also idempotent. The benefits are idempotent and safe (and semantically appropriate). Post (generally) communicates something is changing whereas a query doesn't
gamache
18 hours ago
PUT is idempotent, PATCH is not always. The semantics of a PATCH payload are up to the server and standards like JSON Patch (RFC 6902, https://datatracker.ietf.org/doc/html/rfc6902) allow non-idempotent operations like adding an item to a list.
nfw2
11 hours ago
I stand corrected although using patch this way seems goofy to me.
speleding
6 hours ago
I like the proposal, but I agree they could have sold it better.
This is basically a GET request that can have a body. I've found myself in need of that more than once when I did not want huge URLs with encoded data showing up in logs. Using POST request there is not appropriate because it signals data could be modified (i.e. cannot be sent to read-only instances). I guess modifying the spec to allow GET to have a body would pose too many problems.
CodesInChaos
a day ago
The browser can simply store a collision resistant hash (e.g. SHA-256) of the body, if it wants a smaller cache key. I can't really think of any caching related attacks that don't equally apply to a query parameter. Generating a unique 30 character query parameter is just as easy as generating a 30 MB request body, if you want to flood the cache.
ralferoo
a day ago
Not necessarily that simple, as you'd have sort all the input parameters to maintain a useable cache key. Not especially difficult, but if the data is large and so re-allocation and sorting is required, then you're starting to open up the attack surface where bugs might have been introduced.
dagss
19 hours ago
Do you have to? Is it common to treat ?a=1&b=2 the same as ?b=2&a=1 in browser/CDNs/etc?
Seems the spec puts this as a MAY. I think I doubt it will be implemented in generic ways, except perhaps for urlencoded payloads. After all you cannot normalize in general without knowing the query language. At the backend it does not matter, may as well cache one level deeper based on the parsed input irrespective of QUERY or not.
ralferoo
18 hours ago
No, that was my point. In a GET request, a caching proxy cannot assume the URL is URL encoded parameters, because the URL can contain data encoded in any form. So, you could only cache a GET on an exact URL.
But for a QUERY that explicitly marked the data as multipart or url-encoded, then semantically the order of parameters no longer matters.
That said, it's hypothetical because the only thing that uses those at the moment is POST and that explicitly should never be cached.
But there's another reply above to my comment that points out that a caching implementation is free to do what it likes, and if it fails to cache when parameters are in a different order, then it would still be correct, which is a fair point. That comment was https://news.ycombinator.com/item?id=48578024
CodesInChaos
8 hours ago
I think the only sane approach for caching comparing if the body is identical and not apply any content type specific transformations. It's the safest choice, and the cache not being effective in some edge cases isn't a big deal. The client can always canonicalize the data itself, if it matters for its use-case.
Even if the spec says that different representations are equivalent, it's quite common for applications to treat them differently. For example field ordering in json objects is supposed to not matter, but some serializers care if a type discriminator is the first field.
ygouzerh
a day ago
Regarding the body used as a key for the caching: in the RFC, from my understanding, it's indicated that we can use Location as well:
Exemple:
``` QUERY /search HTTP/1.1 Content-Type: application/json
{ "filters": { "region": "asia", "status": "active" }, "sort": "created_at", "limit": 500 } ```
can answer
``` HTTP/1.1 303 See Other Location: /queries/results/f3a9c1d7 ```
And then you can access later `/queries/results/f3a9c1d7` using a pure GET call, and cache this instead
inigyou
a day ago
Not all usage scenarios are the public internet, and something doesn't have to be useful on the public internet to be standardized.
Realistically, systems for the public internet will use a secure hash as the cache key so it'll always be the same size. The cache key already includes a URL that can be very long, and an arbitrary set of header values.
ralferoo
a day ago
Except that by definition, in a URL the data has no implicit meaning so for a cache hit you need an exact match, including order and case, but for a list of POST parameters, they could legitimately be in any order and so you can't just hash it all as a blob, you need to sort the keys, possibly copy data around (unless using keys plus hash), probably allocating more memory, etc. I'm pretty certain we'll see at least one CVE out of the first few implementations of this!
inigyou
a day ago
POST/QUERY data can be in any format. Who are you to say order doesn't matter? Are you sure you can even parse it? Mine is in DES-encrypted (with key "password") base85 DER, you really gonna implement that in your proxy?
ralferoo
20 hours ago
Maybe my knowledge is out of date in terms of how people generally use POST nowadays, but AFAIK multipart/form-data is still the most common encoding for data and occasionally application/x-www-form-urlencoded.
Both of these, the key values can be in any order with the same interpretation. That's kind of a moot point for POST method, because they should never be cached anyway, but for the new QUERY method it'd be reasonable to expect a cache hit whenever the parameters are the same regardless of order.
My point is that for a GET, you can't assume that the order isn't important, because the URL is an opaque string by the time it hits the cache. However, POST (and now QUERY) explicitly says what the coding is, so for instance with application/x-www-form-urlencoded we can be sure that the parameters can be in any order without changing the meaning. You cannot infer that from a URL itself.
As to your point, yes you can use any other encoding you like to. But most systems don't do that, they use multipart/form-data.
drdexebtjl
20 hours ago
This RFC does not require caching to be implemented at all, so it wouldn’t be reasonable to expect a cache hit, no. But if your implementation does that, cool :)
ralferoo
19 hours ago
An RFC would never mandate caching. But the table in the article says "cacheable: yes" for GET and QUERY. There is no "my implementation" because this is a proposal that has only just been proposed and there is currently "no implementations". I'm simply saying that QUERY will be harder to get caching correct compared to GET, and I'm almost certain there will be end up being CVEs resulting from its implementation.
drdexebtjl
18 hours ago
It’s been an IETF Internet-Draft for a few years at this point, so there are some implementations already in the wild.
What I mean is that implementations are free to choose do something as complex as what you suggest, but also something as simple as hashing the body as a blob, and they can even bail on caching completely (for example if the payload is too large).
All of those options would be correct behavior per the RFC.
Of course we may still see CVEs from this, but they will be self-inflicted, not caused by a complex standard.
tanepiper
a day ago
One example - I'm building an MCP server at the moment for a database I'm working on. In ChatGPT I want to do dry-run posts first that roll back before committing - both are POST requests with a property - and it loves to trigger the safety layer in the tools (for various reasons, it's hard to debug exact causes)
But I think this would make it better - QUERY before POST means different request types, not just the same with a safety flag.
friendzis
a day ago
> It also implies an unbounded and user-controlled cache key.
While the concern is valid, caching is entirely optional at query level, therefore it is totally valid to cache only certain "filters".
cryptonym
a day ago
Sure you can provide an image as request body, but you could already do it with b64 query parameter. If you try hard enough, you can poorly use any proposed standard. GET with query parameters already is opaque and makes cache busting trivial.
layer8
a day ago
Query parameters are length-limited, because HTTP URIs are: https://www.rfc-editor.org/info/rfc9110/#section-4.1-5. There is no expectation for arbitrarily long HTTP URLs to be functioning.
cryptonym
a day ago
Your link doesn't say URIs are length-limited
Draiken
a day ago
I'm guessing you never hit this issue then, but it's a real issue. Whether or not it's in the RFC as a hard limit it doesn't matter, no HTTP server will allow unlimited sized URIs.
You simply can't base64 large payloads and you're stuck with workarounds.
cryptonym
a day ago
You are guessing wrong. Thanks, I know specific implementation will come with their limits. This will equally apply to QUERY body size and caching strategy.
Are we seriously ok with linking the RFC as source while providing a statement that doesn't match? RFC does matter.
ralferoo
a day ago
The RFC does say "It is RECOMMENDED that all senders and recipients support, at a minimum, URIs with lengths of 8000 octets in protocol elements."
One can infer from the RFC that you can reasonably expect many implementations to fail beyond 8000 characters, and that there are no guarantees up to that either.
True, the RFC doesn't specify a limit, but it does clearly indicate that it's not unbounded, nor should you expect it to be.
drdexebtjl
20 hours ago
This RFC (10008) does not require caching to be implemented at all, so it would make no sense to make a recommendation here for what is a reasonable limit to expect caching to work.
layer8
21 hours ago
They are in the sense that the recommended supported length is only 8000 bytes. There are no such specified length recommendations for HTTP body size.
cryptonym
10 hours ago
Recommended supported length is at least 8k.
Of course I don't advocate oversize URLs. That's a point of RFC10008.
Let's say we build a service for image transformation or image information extraction. Get isn't practical. QUERY with image as body could be a valid usage, regardless of caching. It conveys information that request is idempotent and can be retried with no impact on data, contrary to POST. If your http client is configured to support this, it can potentially improve reliability.
epolanski
a day ago
> Why even bother trying to capture this in a generic way?
I guess it's about resolving the odd semantics of using POST which is not idempotent and thus allowing easier control flow of caches and retrys.
Your perspective is 100% correct if you think at the application-layer, but with a dedicated method, you can have that behaviour out-of-the-box out of your HTTP infrastructure (whether it's at your hyperscaler's router or your apache/nginx/browser whatever) and stop implementing yourself the post-as-a-query edge case.
davidkwast
a day ago
I would use a hash of the body content (the query) as a URL parameter
/?hash=123456789
Joker_vD
a day ago
Why? That's pushing more work to do both on yourself and the cache.
WorldMaker
21 hours ago
Actually this is a use-case supported by this RFC [1]. You accept an arbitrary QUERY /search/ and you cache it on your side (or in a middle box somewhere such as a CDN edge) you can return in your response:
Location: /search/?queryHash=SOMECDNHASH
The browser can then cache that Location and the next time convert that same QUERY /search/ into GET /search/?queryHash=SOMECDNHASH.Sure, it is more work for your webserver to compute that and potentially the browser to cache it's knowledge of that QUERY, but it potentially gives you an advantage in keeping things like CDN edge caches generally aware of client/browser caches in a way that can be performance optimized.
wang_li
a day ago
If you control the full stack then the functionality described here can be implemented with POST. The only way this comes into play is if some second party client of your service is trying to impose rules on how your backend works. My answer to that is no. I will be defining the contract by which my services operate.