Re: On diversity [was: Chrome Extension of Semantic Web]

2013/10/14 Austin William Wright <aaa@bzfx.net>

> On Fri, Oct 11, 2013 at 3:06 AM, Matteo Collina <matteo.collina@gmail.com>wrote:
>
>> You are definitely wrong. Being able to have two instances of the same
>> library in the same namespace is best feature of a package manager. Anybody
>> who has worked extensively with Java (or Windows 95) knows how bad it is
>> the situation where two libraries requires different versions of the same
>> dependency.
>> NPM is smart enough to reuse the same version of the library if it is
>> fine for the depending packages.
>>
>
> npm hasn't solved any problems that Java hasn't, it's just the fact that
> ECMAScript has duck typing and Java does not.
>

Let's agree to disagree :)

 As for fragmentation, time will help with it: the community is very young
>> and we are still searching for the best way for solving some problems.
>>
>
> My frustration is that these problems have been solved elsewhere.
>

Well, this is a whole problem of Computer Science. We do the same thing
over and over again, just slightly better each time.


>  You write about being asynchronous, but I'm not sure what you're
>>> referring to exactly. There is no need to have an asynchronous API; as
>>> there is nothing waiting on I/O operations. Likewise, "backpressure" isn't
>>> a well defined term... I believe "congestion" is what you mean? But that's
>>> not something your library would be concerned with, only the application
>>> author would be (let's say they're reading from the filesystem and writing
>>> to a database, in this case the application author would implement
>>> congestion control because they're the one doing I/O).
>>>
>>
>> You have not understood much about node.js. The whole asynchronicity,
>> streams and backpressure are the core of Node.js.
>> To recap:
>> 1) if you want to process a file while you read it, without storing it
>> all in memory, then you HAVE to use streams. This is one of major design
>> decision Ruben was talking about.
>> 2) backpressure is a concept related to streams and asynchronicity. You
>> have two streams, A piped into B, so the data is flowing A --> B. If B
>> cannot process the data at the speed A is generating it, backpressure kicks
>> in and A slows down. (reference
>> https://github.com/substack/stream-handbook)
>>
>
> "Backpressure kicking in" is formally called congestion control, and it
> only applies to I/O. In a library that only does computation, like a Turtle
> parser, there is no I/O, and nothing that is either "synchronous" or
> "asynchronous", because there is no idle CPU time spent waiting for an
> event. One could forcefully introduce I/O to make it asynchronous, e.g. the
> "process.nextTick" Node.js call, but that's needless.
>

Let me rephrase this. In node.js is called back-pressure (
http://blog.nodejs.org/2012/12/21/streams2/), so let's stick to this name.
Everybody in the node community calls it in that way, even if there are
better names (congestion control). Moreover, it identifies a set of
messages exchanged by the stream to handle it. So, it's not a generic term
in this sense, but a very specific implementation of it.

The ability to handle a _stream of data_ is different than handling I/O
> _asynchronously_. A Turtle parser, as we're talking about here, should
> still process incoming events from a stream. A good library won't care how
> you segment up the document, it'll still parse to the same thing:
>
> var parser = new StreamParser();
> parser.write(content);
> parser.end();
>
> var parser = new StreamParser();
> for(var i=0; i<content.length; i++) parser.write(content[i]);
> parser.end();
>
> Once you have a streaming API, then you can implement a Node.js-style
> Streams API on top of it, for where that is easier to use.
>

Definitely.

There is no congestion, because no I/O is being written -- the library (by
> itself) is not waiting for another end to finish receiving emitted events.
>
> This is the problem that I'm talking about with the RDF Interfaces TR, it
> doesn't have any method to make multiple writes for a single document --
> the `write` and `end` calls that I want are implicit in the current `parse`
> definition, requiring programmers to buffer the entire document in memory.
> Separate `write` and `end` calls would be sufficient to handle filesystem
> events.
>

It's not just the input, it is also the output. You want to be able to emit
triples in a way that can be safely forwarded to I/O, without overwhelming
the receiver: that's the case Ruben and I have been discussing for some
months :). In practice, you want congestion control/back-pressure the
moment you want to import a decent number of triples into an async data
storage.

You want to do something similar (in your API)

var parser = new Parser();
parser.ontriple = function(triple) { ..... };
for(var i=0; i<content.length; i++) parser.write(content[i]);
parser.end();

So you can implement node.js streams very easily by wrapping the Parser in
a Transform stream.
I think Ruben is working on an implementation of that idea.

Cheers,

Matteo

Received on Monday, 14 October 2013 14:16:38 UTC