[whatwg/fetch] Automatic decompression should sanitize `Content-Encoding` and `Content-Length` headers from the response (Issue #1729)

### What is the issue with the Fetch Standard?

The `fetch()` spec allows browsers to perform decompression of HTTP responses in `fetch()` if an appropriate `content-encoding` header is set on the response. In this case, the `Response.prototype.body` stream no longer reflects the raw bytes (modulo protocol framing) received on the wire, but instead a processed version of the bytes after being passed through a decompression routine.

This decompression is meant to be transparent to users: they do not have to explicitly opt in or enable it. Further, they can not even disable this (ref #1524).

Unfortunately, the decompression is currently not very transparent: given an arbitrary response object, it is ambiguous whether the `Response`'s body has been decompressed or is still compressed.

This causes real world problems:
- it poses a hazards when implementers add new encodings for automatic decompression, because a user that was previously manually decompressing a response with an unsupported content encoding, can now not tell whether they need to perform decompression or not after a browser adds native support for decompressing this content encoding
- proxies can not tell what headers they need to send downstream (https://github.com/wintercg/fetch/issues/23)

### Proposal

I propose we strip out `Content-Length` (because it represents the content length prior to decompression), and `Content-Encoding` (because it represents the encoding prior to decompression) from `Response` headers when we perform automatic response body decompression in `fetch()`. I am not suggestion this affects responses created with `new Response()` or responses returned from `fetch()` that do not have automatic response body decompression performed.

### Compatibility

I don't think this change will break any existing code. It may skew some folks' monitoring tools. I make this assumption based on the following thoughts:

- The `Content-Length` before decompression is meaningless if you only have the decompressed body. You can not infer how long the real response is based on the `Content-Length` in both gzip and br.
- The original `Content-Encoding` is not useful in combination with a compressed body. The only use I can think of is monitoring usecases where you want to determine what percentage of your assets were served with compression (and with which compression).

### Prior art

In the JavaScript space:
- Both Deno and Cloudflare implement this proposed fix to allow for the proxy use case mentioned above

In other programming languages:
- Go's `http` std lib module has auto decompression enabled by default. It strips out `Content-Length` and `Content-Encoding` when it performs decompression. It has a flag on the response to determine if auto-decompression has taken place. See https://pkg.go.dev/net/http#Response.Uncompressed
- Rust's `reqwest` crate supports auto decompression and enables it by default for clients if the `gzip` or `brotli` compile time flags are set. It strips out `Content-Length` and `Content-Encoding` when it performs decompression. It has no flag to check if decompression has been performed or not.  See https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html#method.gzip
- Python's `requests`: does auto decompression by default, and sets `Content-Length` to the post decompression content length. It does not remove the `Content-Encoding` header
- Ruby's `Net::HTTP`: does auto compression by default, [removing `Content-Encoding`](https://github.com/ruby/net-http/blob/042faf74e77d786ff60dff81555f6ec4f21e77a9/lib/net/http/response.rb#L564), and [rewriting
`Content-Length` to the length after decompression](https://github.com/ruby/net-http/blob/042faf74e77d786ff60dff81555f6ec4f21e77a9/lib/net/http/response.rb#L575-L576).

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/issues/1729
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/fetch/issues/1729@github.com>

Received on Thursday, 21 December 2023 19:54:31 UTC