Re: [streams] Finalize pull/pullInto behavior (#423) from Domenic Denicola on 2016-01-22 (public-webapps-github@w3.org from January 2016)

From: Domenic Denicola <notifications@github.com>
Date: Thu, 21 Jan 2016 16:50:44 -0800
To: whatwg/streams <streams@noreply.github.com>
Message-ID: <whatwg/streams/issues/423/173764783@github.com>
OK, I think I understand. And apologies for the long response time; it has been a crazy week.

> Suppose that we're in a pullInto invocation. We just enqueue()-ed or respond()-ed the pullInto. If highWaterMark is set to e.g. 1024 byte, desiredSize returns 1024 byte. But now the stream still has one more read(view) pending. The source's policy is that it wants to fill a given ArrayBuffer (provided via pullInto args) if any, but otherwise, wants to fill the queue by enqueue(). Without byobRequest, we need to invoke pullInto again to notify the source that there's one more read(view). The source needs to wait for 1 micro task to see whether such pullInto occurs or not to decide whether it should use enqueue() or not (use respond() in response to pullInto).

I started writing up an argument against this. In particular I don't think the microtask is a big deal. But the moment I wrote some example code, it turns out that .byobRequest is really really nice. But also exposes some potential problems. Let me explain.

Remember the distinction between pull vs. push sources, which in the end comes down to [files](https://blog.domenic.me/reading-from-files/) vs. [sockets](https://blog.domenic.me/reading-from-sockets/).

For files, you want to write your underlying byte source as a slightly modified version of the [spec's existing example](https://streams.spec.whatwg.org/#example-rs-pull):

```js
pullInto(controller) {
  const buffer = controller.byobRequest;
  return fs.read(fd, buffer, 0, buffer.byteLength, position).then(bytesRead => {
    if (bytesRead === 0) {
      return fs.close(fd).then(() => controller.close());
    } else {
      position += bytesRead;
      controller.respond();
    }
  });
}
```

In this example, it doesn't make much difference whether it's an argument to pullInto or whether it's controller.byobRequest. I think it's nicer as an argument, but it doesn't really matter much.

In contrast, the socket example is very different. You want to write your byte source like this ([based off the spec's existing example](https://streams.spec.whatwg.org/#example-rs-push-backpressure), but using the socket API from the blog post):

```js
start(controller) {
  socket.on("readable", () => {
    // At this point the socket has data ready for us, so we need to do
    // one of two things:
    // 1. If there's a pending BYOB request, fill it up.
    // 2. Otherwise, enqueue it in the internal queue, so as to get it out of the
    //    kernel and ensure the kernel buffer doesn't ever overflow.
    let bytesRead;
    if (controller.byobRequest) {
      bytesRead = socket.readInto(controller.byobRequest, 0, controller.byobRequest.byteLength);
      controller.respond();
    } else {
      const buffer = new ArrayBuffer(1024); // TODO pick a better number than 1024
      bytesRead = socket.readInto(buffer, 0, buffer.byteLength);
      controller.enqueue(buffer);
    }
    
    if (controller.desiredSize <= 0) {
      // This implies there are zero BYOB requests *and*
      // the internal queue has filled up to the internal queue HWM.
      socket.readStop();
    }
    
    if (bytesRead === 0) {
      controller.close();
    }
    // maybe -1 is error too
  });
}

pull() {
  // Called if the internal queue is emptied (and there are no BYOB requests?)
  socket.readStart();
}
```

The socket example doesn't actually want to wait for pullInto at all. Does this seem right to you? (I'm really really sorry if we've discussed this before and are going in circles :()

The socket example exposes a few issues.

- First, maybe it should be `c.byobRequest.buffer` + `c.byobRequest.respond()` or something, just for clarity.
- But more importantly, it means that in some cases an underlying byte source might not have pullInto. I guess the fix for this is to add an explicit boolean like `{ start(), pull(), byob: true }`. And either pullInto present implies `byob: true` or you get an error if you forget `byob: true` but also supply pullInto.
- Maybe we just merge pull and pullInto?? When would anyone ever use them both? If we have `c.byobRequest` present, then pull can do all the work pullInto currently does.

I know a lot of your examples were concerned with "what if you can't fill up the entire buffer", or "what if there are multiple queued byob requests". I think in those cases it's actually fine to do another pull call (or wait for another `socket.on("readable"` in `start`). The microtask isn't a big deal. But thinking about the kind of main-stream scenarios here, like I've outlined above, I think byobRequest solves them in an important way.

> Whether or not we use the return value of pull call to suppress pull invocation (if pending promise is returned, suppress pull invocation until the promise gets resolved).

I don't really understand the argument against the current behavior yet. Maybe it will change after we reconsider the above. But: the most important part of the current behavior is the eror propagation. I agree that in all the examples so far you always call `controller.enqueue()` or `controller.close()` at basically the same time the promise fulfills. So if there are examples where using the promise timing to suppress further calls is bad, I'm open to changing it, as long as we keep error propagation.

---
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/streams/issues/423#issuecomment-173764783
Received on Friday, 22 January 2016 00:51:16 UTC