Re: [whatwg/streams] Is when pull() is called on an underlying source deterministic? (#1155)

I noticed a bug in `ChunkProcessor`. You're passing the *same* strategy (`options`) as `writableStrategy` and `readableStrategy` to the `new TransformStream()` constructor. This effectively *doubles* the queue size, since both the writable end and the readable will keep requesting chunks until each of them has filled up to its HWM.

Usually, you want to keep the `readableStrategy` at its default (HWM = 0). So I suggest you change the code to:
```javascript
return new TransformStream({
  // ...
}, options);
```
(The rest of the analysis assumes that this change is in place.)

> That would make sense (chunks shifted downstream without exceeding `highWaterMark`s of individual streams).
> 
> But that's not what I'm seeing. [Instead I see](https://jsfiddle.net/jib1/n6wLv54r/39/) `rs1` pull in 7 chunks off the bat, shifting only one of them downstream. (I've added a 1 second delay in the transforms to help examine order).
> 
> If I change the first transform's highWaterMark from `2` to `10`, `rs1` [pulls in](https://jsfiddle.net/jib1/n6wLv54r/40/) 15 chunks off the bat, almost 200% over its highWaterMark.
> 
> 10 of them seem in limbo, since `rs1`'s `desiredSize` is `0`. But things stay in this state for a whole second. Where are they?

They're in *the writable end* of the first transform stream.

The first processor has HWM = 2, so it wants to pull two chunks from the producer. After it receives those two chunks, the producer will keep pulling until it has filled its own queue up to its own HWM = 5.

The second processor has HWM = 1, so it wants to pull one chunk from the first processor. To do that, the first processor must take a chunk from its writable end's queue and transform it. That's why you see this log at the start:
```
0: Processing A to AA (desiredSize=0)...
```

After one second, the chunk is transformed. The second processor pulls it from the first processor's readable end and puts it into its own writable end's queue. The second processor has now filled up to its HWM, but now the first processor has one less chunk in its queue. So it pulls a chunk from the producer's queue, and the producer has to pull from its underlying source again:
```
1008: ...done processing A to AA
1008: Pulling H (desiredSize=1, enqueued=7)
```

So after the initial setup, this is the state of the pipe chain:
* producer: D, E, F, G, H (5 chunks)
* first processor writable queue: B, C (2 chunks)
* second processor writable queue: AA (1 chunk)

> If I change the first transform's highWaterMark from `2` to `10`, `rs1` [pulls in](https://jsfiddle.net/jib1/n6wLv54r/40/) 15 chunks off the bat, almost 200% over its highWaterMark.

There seems to be some confusion about the meaning of the "high water mark". A stream's HWM tells it how many chunks to keep in its queue *if no-one else is reading from it*. If the stream is being piped into another stream, then that other stream will be taking chunks out of the stream's queue and either process them immediately or put them in its own queue. This means our original stream's queue becomes smaller, and needs to produce extra chunks to fill up to its own HWM again.

So indeed, to fill both queues, you need to produce 15 (= 5 + 10) chunks.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/streams/issues/1155#issuecomment-892173054

Received on Tuesday, 3 August 2021 21:18:37 UTC