Re: Overlap between StreamReader and FileReader from Isaac Schlueter on 2013-08-09 (public-webapps@w3.org from July to September 2013)

From: Isaac Schlueter <i@izs.me>
Date: Thu, 8 Aug 2013 22:37:04 -0700
To: Austin William Wright <aaa@bzfx.net>
Cc: Jonas Sicking <jonas@sicking.cc>, Domenic Denicola <domenic@domenicdenicola.com>, Takeshi Yoshino <tyoshino@google.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CADcwD-GnnCz+g8OeRxEZb4az5HyBKA4mFySxQb-+aKyDE+Teyw@mail.gmail.com>

On Thu, Aug 8, 2013 at 7:40 PM, Austin William Wright <aaa@bzfx.net> wrote:
> I believe the term is "congestion control" such as the TCP congestion
> control algorithm.

As I've heard the term used, "congestion control" is slightly
different than "flow control" or "tcp backpressure", but they are
related concepts, and yes, your point is dead-on, Austin, this is
absolutely 100% essential.  Any Stream API that treats backpressure as
an issue to handle later is not a Stream API, and is clearly not ready
to even bother discussing.

On Thu, Aug 8, 2013 at 7:40 PM, Austin William Wright <aaa@bzfx.net> wrote:
> I think there's some confusion as to what the abort() call is going to do
> exactly.

Yeah, I'm rather confused by that as well.  A read[2] operation
typically can't be "canceled" because it's synchronous.

Let's back up just a step here, and talk about the fundamental purpose
of an API like this.  Here's a strawman:

-----
A "Readable Stream" is an abstraction representing an ordered set of
data which may or may be finite, some or all of which may arrive at a
future time, which can be consumed at any arbitrary rate up to the
rate at which data is arriving, without causing excessive memory use.
It provides a mechanism to send the data into a Writable Stream, and
for being alerted to errors in the underlying implementation.

A "Writable Stream" is an abstraction representing a destination where
data is written, where any given write operation may be completely
flushed to the underlying implementation immediately or at some point
in the future.  It provides a mechanism for determining when more data
can be safely written without causing excessive memory usage, and for
being alerted to errors in the underlying implementation.

A "Duplex Stream" is an abstraction that implements both the Readable
Stream and Writable Stream interfaces.  There may or may not be any
specific connection between the two sets of functionality.  (For
example, it may represent a tcp socket file descriptor, or any
arbitrary readable/writable API that one can imagine.)
-----

For any stream implementation, I typically try to ask: How would you
build a non-blocking TCP implementation using this abstraction?  This
might just be my bias coming from Node.js, but I think it's a fair
test of a Stream API that will be used on the web, where TCP is the
standard.  Here are some things that need to work 100%, assuming a
Readable.pipe(Writable) method:

fastReader.pipe(slowWriter)
slowReader.pipe(fastWriter)
socket.pipe(socket) // echo server
socket.pipe(new gzipDeflate()).pipe(socket)
socket.pipe(new gzipInflate()).pipe(socket)

Node's streams, as of 0.11.5, are pretty good.  However, they've
"evolved" rather than having been "intelligently designed", so in many
areas, the API surface is not as elegant as it could be.  In
particular, I think that relying on an EventEmitter interface is an
unfortunate choice that should not be repeated in this specification.
The language has new features, and Promises are somewhat
well-understood now (and weren't as much then).  But Node streams have
definitely got a lot of play-testing that we can lean on when
designing something better.

Calling read() repeatedly is much less convenient than doing something
like `stream.on('data', doSomething)`.  Additionally, you often want
to "spy" on a Stream, and get access to its data chunks as they come
in, without being the main consumer of the Stream.

Received on Friday, 9 August 2013 06:18:56 UTC