Re: On diversity [was: Chrome Extension of Semantic Web] from Austin William Wright on 2013-10-14 (public-rdfjs@w3.org from October 2013)

From: Austin William Wright <aaa@bzfx.net>
Date: Mon, 14 Oct 2013 06:54:54 -0700
To: Matteo Collina <matteo.collina@gmail.com>
Cc: public-rdfjs@w3.org
Message-ID: <CANkuk-UASfVKVutNxyew9DNieqX7gZOGrDCq_zExt4L8cRO=Qg@mail.gmail.com>
On Fri, Oct 11, 2013 at 3:06 AM, Matteo Collina <matteo.collina@gmail.com>wrote:

> Hi Austin,
>
> 2013/10/11 Austin William Wright <aaa@bzfx.net>
>
>> I haven't been impressed with the Node.js ecosystem. Not that there's too
>> much choice, but that there's no guarantees of anything.
>>
>
> Even big companies (Microsoft, Oracle, or Google) do not guarantee
> anything. You are the ultimate maintainer of your software. You are
> responsible for keeping it working. This might be a philosophy of life :).
>

Well of course. I'm pointing out that the Node.js ecosystem has much more
fragmentation. (That is a better term.) Packages go out of maintenance
_frequently_, I've had to find replacements for several packages now.


>
>
>> Everyone releases 0.x versions of software with no indication of
>> breakage, or entire packages simply go out of support, or you end up using
>> two different packages that do the exact same thing, or two different
>> versions of the same package, or two different instances of the same
>> version of the same package (because npm insists on checking out multiple
>> instances of the same package). It's a nightmare. (I'd like to move to an
>> RDF-based package manager, generic enough for an entire OS, but that's
>> another conversation to have.)
>>
>
> You are definitely wrong. Being able to have two instances of the same
> library in the same namespace is best feature of a package manager. Anybody
> who has worked extensively with Java (or Windows 95) knows how bad it is
> the situation where two libraries requires different versions of the same
> dependency.
> NPM is smart enough to reuse the same version of the library if it is fine
> for the depending packages.
>

npm hasn't solved any problems that Java hasn't, it's just the fact that
ECMAScript has duck typing and Java does not. Most significantly, if you
ever use `instanceof` when it has checked out multiple instances of a
package, you run head-on into problems, frequently hard to diagnose because
of identical `name`s.

As for fragmentation, time will help with it: the community is very young
> and we are still searching for the best way for solving some problems.
>

My frustration is that these problems have been solved elsewhere.


>
> We should definitely take the time to index functionality of the various
>> libraries that exist, after first indexing the common use-cases that these
>> libraries will be used towards.
>>
>> Perhaps after this, there are two things I would suggest:
>>
>> 1. Publishing specifications, so we can define the functionality we want
>> in terms of what would be best for developing. (Libraries are supposed to
>> benefit the developers using them, not necessarily the developer writing
>> the library.)
>>
>> 2. Changing and specializing the functionality of the libraries
>> accordingly.
>>
>
> I completely agree.
>
> You write about being asynchronous, but I'm not sure what you're referring
>> to exactly. There is no need to have an asynchronous API; as there is
>> nothing waiting on I/O operations. Likewise, "backpressure" isn't a well
>> defined term... I believe "congestion" is what you mean? But that's not
>> something your library would be concerned with, only the application author
>> would be (let's say they're reading from the filesystem and writing to a
>> database, in this case the application author would implement congestion
>> control because they're the one doing I/O).
>>
>
> You have not understood much about node.js. The whole asynchronicity,
> streams and backpressure are the core of Node.js.
> To recap:
> 1) if you want to process a file while you read it, without storing it all
> in memory, then you HAVE to use streams. This is one of major design
> decision Ruben was talking about.
> 2) backpressure is a concept related to streams and asynchronicity. You
> have two streams, A piped into B, so the data is flowing A --> B. If B
> cannot process the data at the speed A is generating it, backpressure kicks
> in and A slows down. (reference
> https://github.com/substack/stream-handbook)
>

"Backpressure kicking in" is formally called congestion control, and it
only applies to I/O. In a library that only does computation, like a Turtle
parser, there is no I/O, and nothing that is either "synchronous" or
"asynchronous", because there is no idle CPU time spent waiting for an
event. One could forcefully introduce I/O to make it asynchronous, e.g. the
"process.nextTick" Node.js call, but that's needless.

The ability to handle a _stream of data_ is different than handling I/O
_asynchronously_. A Turtle parser, as we're talking about here, should
still process incoming events from a stream. A good library won't care how
you segment up the document, it'll still parse to the same thing:

var parser = new StreamParser();
parser.write(content);
parser.end();

var parser = new StreamParser();
for(var i=0; i<content.length; i++) parser.write(content[i]);
parser.end();

Once you have a streaming API, then you can implement a Node.js-style
Streams API on top of it, for where that is easier to use.

There is no congestion, because no I/O is being written -- the library (by
itself) is not waiting for another end to finish receiving emitted events.

This is the problem that I'm talking about with the RDF Interfaces TR, it
doesn't have any method to make multiple writes for a single document --
the `write` and `end` calls that I want are implicit in the current `parse`
definition, requiring programmers to buffer the entire document in memory.
Separate `write` and `end` calls would be sufficient to handle filesystem
events.

Austin Wright.
Received on Monday, 14 October 2013 13:55:22 UTC