Re: On diversity [was: Chrome Extension of Semantic Web]

On Mon, Oct 14, 2013 at 7:15 AM, Matteo Collina <matteo.collina@gmail.com>wrote:

>
>
>
> 2013/10/14 Austin William Wright <aaa@bzfx.net>
>
>> On Fri, Oct 11, 2013 at 3:06 AM, Matteo Collina <matteo.collina@gmail.com
>> > wrote:
>>
>>> You are definitely wrong. Being able to have two instances of the same
>>> library in the same namespace is best feature of a package manager. Anybody
>>> who has worked extensively with Java (or Windows 95) knows how bad it is
>>> the situation where two libraries requires different versions of the same
>>> dependency.
>>> NPM is smart enough to reuse the same version of the library if it is
>>> fine for the depending packages.
>>>
>>
>> npm hasn't solved any problems that Java hasn't, it's just the fact that
>> ECMAScript has duck typing and Java does not.
>>
>
> Let's agree to disagree :)
>

Well, if I'm wrong, I'd prefer to understand why. Mind, I'm not saying a
right way to do it (that's another discussion), but the way it's done here
is particularly problematic (instanceof, prototype chain). Relying on duck
typing isn't solving any problems, it's just masking them.


>
>
>>  You write about being asynchronous, but I'm not sure what you're
>>>> referring to exactly. There is no need to have an asynchronous API; as
>>>> there is nothing waiting on I/O operations. Likewise, "backpressure" isn't
>>>> a well defined term... I believe "congestion" is what you mean? But that's
>>>> not something your library would be concerned with, only the application
>>>> author would be (let's say they're reading from the filesystem and writing
>>>> to a database, in this case the application author would implement
>>>> congestion control because they're the one doing I/O).
>>>>
>>>
>>> You have not understood much about node.js. The whole asynchronicity,
>>> streams and backpressure are the core of Node.js.
>>> To recap:
>>> 1) if you want to process a file while you read it, without storing it
>>> all in memory, then you HAVE to use streams. This is one of major design
>>> decision Ruben was talking about.
>>> 2) backpressure is a concept related to streams and asynchronicity. You
>>> have two streams, A piped into B, so the data is flowing A --> B. If B
>>> cannot process the data at the speed A is generating it, backpressure kicks
>>> in and A slows down. (reference
>>> https://github.com/substack/stream-handbook)
>>>
>>
>> "Backpressure kicking in" is formally called congestion control, and it
>> only applies to I/O. In a library that only does computation, like a Turtle
>> parser, there is no I/O, and nothing that is either "synchronous" or
>> "asynchronous", because there is no idle CPU time spent waiting for an
>> event. One could forcefully introduce I/O to make it asynchronous, e.g. the
>> "process.nextTick" Node.js call, but that's needless.
>>
>
> Let me rephrase this. In node.js is called back-pressure (
> http://blog.nodejs.org/2012/12/21/streams2/), so let's stick to this
> name. Everybody in the node community calls it in that way, even if there
> are better names (congestion control). Moreover, it identifies a set of
> messages exchanged by the stream to handle it. So, it's not a generic term
> in this sense, but a very specific implementation of it.
>

We're not Node.js specific though, all the Internet and Web standards call
it congestion, and this Community Group would be well off using this
well-defined, formal usage, instead of what's found on Substack's
Node.js-specific GitHub. There's no confusion over using this single name,
it's the exact same phenomenon found in TCP, event loops, and threads;
defined in the earliest RFCs; and so on.


>
> There is no congestion, because no I/O is being written -- the library (by
>> itself) is not waiting for another end to finish receiving emitted events.
>>
>> This is the problem that I'm talking about with the RDF Interfaces TR, it
>> doesn't have any method to make multiple writes for a single document --
>> the `write` and `end` calls that I want are implicit in the current `parse`
>> definition, requiring programmers to buffer the entire document in memory.
>> Separate `write` and `end` calls would be sufficient to handle filesystem
>> events.
>>
>
> It's not just the input, it is also the output. You want to be able to
> emit triples in a way that can be safely forwarded to I/O, without
> overwhelming the receiver: that's the case Ruben and I have been discussing
> for some months :). In practice, you want congestion control/back-pressure
> the moment you want to import a decent number of triples into an async data
> storage.
>

The rule is that whoever is writing to I/O handles congestion control. The
library isn't writing to I/O, the application is. So if the recipient of
the parsed data is getting overwhelmed, then the application is responsible
for throttling down calls to the parsing library (typically by pause()ing
the upstream). To make this a one-liner, you can provide a Node.js-style
Streams API on top to add pipe() support, and then pass-through pause()
calls.

For <https://github.com/RubenVerborgh/node-n3>, it appears this could be
done without necessary API breakage.


>
> You want to do something similar (in your API)
>
> var parser = new Parser();
> parser.ontriple = function(triple) { ..... };
> for(var i=0; i<content.length; i++) parser.write(content[i]);
> parser.end();
>

That's effectively exactly how RDF Interfaces currently works for emitting
triples. You can also define a Graph object to automatically insert triples
into, as well as a "filter" function argument to select which ones to
insert.


>
> So you can implement node.js streams very easily by wrapping the Parser in
> a Transform stream.
> I think Ruben is working on an implementation of that idea.
>

>

Exposing a Transform stream is exactly the way to go.

Austin.

Received on Friday, 18 October 2013 20:44:24 UTC