[whatwg/streams] Cloning (not teeing) a readable stream, via controller tricks (#599)

This is a continuation of the thread at https://github.com/yutakahirano/fetch-with-streams/issues/67#issuecomment-253530274, where @youennf proposes a novel set of tricks that would allow true cloning (not teeing) of a readable stream. First I will explain the trick and talk about why it works. Then we can talk about whether we should support this.

---

>From the outside, it makes no sense to clone a stream. Reading from a stream is destructive---once you read, the chunk is no longer in the stream. So the best you can do is tee it: create two new streams, read from the original, and the enqueue into the two new ones. There's no way to read from the stream, enqueue in the clone, but still somehow leave the chunk in the stream. @youennf's trick gets around this.

The trick depends crucially on the way we have structured streams to be facades around controllers, so that all the interesting behavior, including the data, is stored in the controller. This was originally a design innovation in order to allow both byte and default streams to be served by the same public `ReadableStream` API: all the interesting behavior takes place in either the `ReadableStreamDefaultController` or the `ReadableByteStreamController`.

The innovation is to consider re-targeting a `ReadableStream` at different controller than the one that was created along with it. This allows the `ReadableStream` to start exposing a different set of chunks than those that are put into it by its creator, since its creator still manipulates the original controller.

Given a stream `toClone`, the steps are:

1. Create a new stream, `teeStream`.
2. Move `toClone`'s controller to `teeStream`. At this point the original controller for `teeStream` has been thrown away and `toClone` has no controller.
3. Let `tee1` and `tee2` be the result of teeing `teeStream`. They each have their own controller.
4. Move the controller of `tee1` to `toClone`. At this point `tee1` has no controller but everyone else does. Throw away `tee1`.
5. Return `tee2`: it is a clone.

At this point, code that uses the original controller for `toClone` will enqueue in the controller for hidden stream `teeStream`, and thus (via the teeing mechanism) into the controller for `tee1` and `tee2`. Translating that into streams which allow you to read from them, using the original controller for `toClone` will enqueue into `toClone` and into `tee2`.

This requires careful tracking of the original controller. For example, the operations in https://fetch.spec.whatwg.org/#readablestream would not be correct, since they take as an argument _stream_ and then use _stream_.[[readableStreamController]]. This is not generally a problem for author code, but it does require careful bookkeeping for specs/UA code.

---

I am torn on whether we should pursue this. On the one hand, it is pretty cool. If you think cloning is a natural thing to do to streams, this accomplishes it neatly.

On the other hand, it is using tricks not accessible to authors, and is hard to explain. Teeing and piping and such all are explicable as operations you could write. They fit with the destructive model of streams and don't use any magic. You could easily write a version of tee() that uses different backpressure semantics or similar. But if someone wanted to create their own version of clone with some customizations, they could not.

What would be helpful to me is figuring out whether developers find cloning a stream to be a natural thing. Are they surprised that it isn't possible? How much are they missing it? How weird do they find `request.clone()`'s behavior, which resets `request.body` because of the tee semantics?

We could also consider exposing this operation only to specs, and not as a public `.clone()` method, just to make `request.clone()` not reset `request.body`. In that line of thinking, we'd say that exposing cloning of requests was a mistake since it doesn't fit with the streams model, so we don't want to perpetuate it further in the streams-using ecosystem, but we also want to make sure `request.clone()` as it exists is maximally reasonable.

I'd love to hear some thoughts.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/streams/issues/599

Received on Friday, 4 November 2016 21:42:01 UTC