- From: Mattias Buelens <notifications@github.com>
- Date: Wed, 02 Mar 2022 01:00:26 -0800
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/issues/283/1056616763@github.com>
I think this is intentional. Readable byte streams also don't allow writing empty chunks, and it's likely that we'll want to make `TextDecoderStream.readable` a proper byte stream in the future:
```javascript
const rs = new ReadableStream({
type: "bytes",
start(controller) {
controller.enqueue(new Uint8Array(0)); // throws
}
});
```
I'm a bit surprised by Deno's `LineStream` design. I would expect a transform stream that splits text by line delimiters would accept *strings* as input and produce *strings* as output. Instead, it looks like it uses raw byte chunks as both input and output?
That means that `LineStream` is making an assumption about the text encoding, right? How exactly is that supposed to deal with multi-byte text encodings like `utf-16`? For example:
```javascript
new TextDecoder("utf-16").decode(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
// -> "A\nB"
```
I would expect you *first* run these chunks through a `TextDecoderStream`, and *then* split by line delimiters:
```javascript
const readable = new ReadableStream({
start(controller) {
controller.enqueue(new Uint8Array([0x41, 0x00, 0x0A, 0x00, 0x42, 0x00]));
controller.close();
}
});
readable
.pipeThrough(new TextDecoderStream("utf-16"))
.pipeThrough(new LineStream());
// -> stream with chunks "A" and "B"
```
--
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/283#issuecomment-1056616763
You are receiving this because you are subscribed to this thread.
Message ID: <whatwg/encoding/issues/283/1056616763@github.com>
Received on Wednesday, 2 March 2022 09:00:39 UTC