Re: [whatwg/encoding] TextDecoderStream: empty Uint8Array should result in an empty string (Issue #283) from Mattias Buelens on 2022-03-02 (public-webapps-github@w3.org from March 2022)

From: Mattias Buelens <notifications@github.com>
Date: Wed, 02 Mar 2022 01:00:26 -0800
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/issues/283/1056616763@github.com>

I think this is intentional. Readable byte streams also don't allow writing empty chunks, and it's likely that we'll want to make `TextDecoderStream.readable` a proper byte stream in the future:
```javascript
const rs = new ReadableStream({
  type: "bytes",
  start(controller) {
    controller.enqueue(new Uint8Array(0)); // throws
  }
});
```
I'm a bit surprised by Deno's `LineStream` design. I would expect a transform stream that splits text by line delimiters would accept *strings* as input and produce *strings* as output. Instead, it looks like it uses raw byte chunks as both input and output?

That means that `LineStream` is making an assumption about the text encoding, right? How exactly is that supposed to deal with multi-byte text encodings like `utf-16`? For example:
```javascript
new TextDecoder("utf-16").decode(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
// -> "A\nB"
```
I would expect you *first* run these chunks through a `TextDecoderStream`, and *then* split by line delimiters:
```javascript
const readable = new ReadableStream({
  start(controller) {
    controller.enqueue(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
    controller.close();
  }
});

readable
  .pipeThrough(new TextDecoderStream("utf-16"))
  .pipeThrough(new LineStream());
// -> stream with chunks "A" and "B"
```

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/283#issuecomment-1056616763
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/283/1056616763@github.com>

Received on Wednesday, 2 March 2022 09:00:39 UTC