[whatwg/fetch] Getting all bytes in a body (#661)

(This spans HTML, Encoding, Streams, and Fetch. Let's tag at least @ricea to help while we're here, and @jakearchibald since he was helpful in https://github.com/whatwg/infra/issues/181.)

I need help figuring out how at a spec level to get all bytes in a body. This is part of solving https://github.com/whatwg/html/issues/3316.

Right now specs are using a few patterns:

- https://fetch.spec.whatwg.org/#concept-body-consume-body or https://w3c.github.io/payment-method-manifest/#fetch-pmm 4.5.4: get reader, read all bytes into Uint8Array. Then treat it as a byte sequence.
- https://html.spec.whatwg.org/#fetch-a-classic-script step 8 (which I need to replace): treat "body" as if it was the same as Encoding's "byte stream" concept and just feed it into that.

In general the problem here is Streams use of the JS formalism, including promises, and how that interacts poorly with spec-level code and its different conventions and type system. The confusion between encoding's streams and Streams's streams is also tricky.

Looking at https://html.spec.whatwg.org/#fetch-a-classic-script my first thought is that it should be written as:

8. Let _body bytes_ be the byte sequence obtained by [reading all the bytes] of the [body] of _response_.
9. Let _source text_ be the result of [decoding] _body bytes_ to Unicode, using _character encoding_ as the fallback encoding.
10. ... muted errors ...
11. Let _script_ be the result of creating a classic script given _body bytes_, _source text_, ... the other stuff.

This doesn't quite work on a few levels:

- "reading all the bytes" would at the very least need to be asynchronous, if it goes through Streams's promise machinery.
- "reading all the bytes" might fail. This is currently unaccounted for, but if we go through Streams's machinery, it's more explicit.
- "reading all the bytes" will need to hand-wave to get from Uint8Array to byte sequence.
- "decoding" doesn't accept byte sequences, only "byte streams".

So here is my proposal, which is essentially trying for a minimal delta from today:

- We define "reading all the bytes", probably in whatwg/fetch. It returns either a byte sequence, or null to signal failure. Either its asynchronous, or we use "wait" (see https://github.com/whatwg/infra/issues/181) to make it synchronous. It wraps up all the promise/Uint8Array stuff so that spec authors don't have to worry about it, hand-waving as appropriate.
- We define in Encoding that "byte sequences" can be used as "byte streams" implicitly, when appropriate.

---

An alternate approach, which you might prefer, is to double-down on the spec-level concept of a byte stream. We'd provide some way of translating a ReadableStream, and thus a body, into a spec-level byte stream. It creates a reader, locks the ReadableStream, and then from then on, specs only manipulate the byte stream.

The hardest part of this, I think, is putting the asynchronicity of "reading" from the byte stream on solid ground. It seems very hand-wavey in Encoding, and most of Encoding's clients, right now. We could either:

- Say that you can read "synchronously", in that it waits for more data to come in before "read from a byte stream" returns
- or say that read needs to be asynchronous

This ties back to https://github.com/whatwg/infra/issues/181 again. The problem with synchronous reads is that you can only do them from in-parallel sections, and I think in most cases "decode" does not run in those sections.

Indeed, the larger issue where encode/decode often are used on strings/byte sequences, instead of on "character streams"/"byte streams", seems pretty prevalent: see e.g. https://html.spec.whatwg.org/#form-submission-algorithm:encode or https://html.spec.whatwg.org/#navigating-across-documents:utf-8-decode. So maybe we need to do something about that anyway.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/fetch/issues/661

Received on Friday, 12 January 2018 21:13:24 UTC