- From: Matteo Collina <matteo.collina@gmail.com>
- Date: Mon, 14 Oct 2013 16:15:50 +0200
- To: Austin William Wright <aaa@bzfx.net>
- Cc: "public-rdfjs@w3.org" <public-rdfjs@w3.org>
- Message-ID: <CAANuz57H0b3MTxBEgkoSPatp3YUFHdYUJPdRvhHjjK2Mu96auA@mail.gmail.com>
2013/10/14 Austin William Wright <aaa@bzfx.net> > On Fri, Oct 11, 2013 at 3:06 AM, Matteo Collina <matteo.collina@gmail.com>wrote: > >> You are definitely wrong. Being able to have two instances of the same >> library in the same namespace is best feature of a package manager. Anybody >> who has worked extensively with Java (or Windows 95) knows how bad it is >> the situation where two libraries requires different versions of the same >> dependency. >> NPM is smart enough to reuse the same version of the library if it is >> fine for the depending packages. >> > > npm hasn't solved any problems that Java hasn't, it's just the fact that > ECMAScript has duck typing and Java does not. > Let's agree to disagree :) As for fragmentation, time will help with it: the community is very young >> and we are still searching for the best way for solving some problems. >> > > My frustration is that these problems have been solved elsewhere. > Well, this is a whole problem of Computer Science. We do the same thing over and over again, just slightly better each time. > You write about being asynchronous, but I'm not sure what you're >>> referring to exactly. There is no need to have an asynchronous API; as >>> there is nothing waiting on I/O operations. Likewise, "backpressure" isn't >>> a well defined term... I believe "congestion" is what you mean? But that's >>> not something your library would be concerned with, only the application >>> author would be (let's say they're reading from the filesystem and writing >>> to a database, in this case the application author would implement >>> congestion control because they're the one doing I/O). >>> >> >> You have not understood much about node.js. The whole asynchronicity, >> streams and backpressure are the core of Node.js. >> To recap: >> 1) if you want to process a file while you read it, without storing it >> all in memory, then you HAVE to use streams. This is one of major design >> decision Ruben was talking about. >> 2) backpressure is a concept related to streams and asynchronicity. You >> have two streams, A piped into B, so the data is flowing A --> B. If B >> cannot process the data at the speed A is generating it, backpressure kicks >> in and A slows down. (reference >> https://github.com/substack/stream-handbook) >> > > "Backpressure kicking in" is formally called congestion control, and it > only applies to I/O. In a library that only does computation, like a Turtle > parser, there is no I/O, and nothing that is either "synchronous" or > "asynchronous", because there is no idle CPU time spent waiting for an > event. One could forcefully introduce I/O to make it asynchronous, e.g. the > "process.nextTick" Node.js call, but that's needless. > Let me rephrase this. In node.js is called back-pressure ( http://blog.nodejs.org/2012/12/21/streams2/), so let's stick to this name. Everybody in the node community calls it in that way, even if there are better names (congestion control). Moreover, it identifies a set of messages exchanged by the stream to handle it. So, it's not a generic term in this sense, but a very specific implementation of it. The ability to handle a _stream of data_ is different than handling I/O > _asynchronously_. A Turtle parser, as we're talking about here, should > still process incoming events from a stream. A good library won't care how > you segment up the document, it'll still parse to the same thing: > > var parser = new StreamParser(); > parser.write(content); > parser.end(); > > var parser = new StreamParser(); > for(var i=0; i<content.length; i++) parser.write(content[i]); > parser.end(); > > Once you have a streaming API, then you can implement a Node.js-style > Streams API on top of it, for where that is easier to use. > Definitely. There is no congestion, because no I/O is being written -- the library (by > itself) is not waiting for another end to finish receiving emitted events. > > This is the problem that I'm talking about with the RDF Interfaces TR, it > doesn't have any method to make multiple writes for a single document -- > the `write` and `end` calls that I want are implicit in the current `parse` > definition, requiring programmers to buffer the entire document in memory. > Separate `write` and `end` calls would be sufficient to handle filesystem > events. > It's not just the input, it is also the output. You want to be able to emit triples in a way that can be safely forwarded to I/O, without overwhelming the receiver: that's the case Ruben and I have been discussing for some months :). In practice, you want congestion control/back-pressure the moment you want to import a decent number of triples into an async data storage. You want to do something similar (in your API) var parser = new Parser(); parser.ontriple = function(triple) { ..... }; for(var i=0; i<content.length; i++) parser.write(content[i]); parser.end(); So you can implement node.js streams very easily by wrapping the Parser in a Transform stream. I think Ruben is working on an implementation of that idea. Cheers, Matteo
Received on Monday, 14 October 2013 14:16:38 UTC