Re: On diversity [was: Chrome Extension of Semantic Web] from Henry Story on 2013-10-11 (public-rdfjs@w3.org from October 2013)

From: Henry Story <henry.story@bblfish.net>
Date: Fri, 11 Oct 2013 08:24:05 +0200
To: Ruben Verborgh <ruben.verborgh@ugent.be>
Cc: public-rdfjs@w3.org, Adrian Gschwend <ktk@netlabs.org>
Message-Id: <DFA46F95-5596-4B7A-BB96-75566C2AAAC0@bblfish.net>
+1. Thanks for this excellent reply.
I find the same thing looking at Java parsers. I wrote an
asynchronous parser a year ago, only to find it was 5 times slower than
the standard parsers. That was too much for me at the time, and
was probably due to the very alpha state of the parser library I was using.
But I would certainly have been happy to pay something for asynchronicity.

On 10 Oct 2013, at 12:04, Ruben Verborgh <ruben.verborgh@ugent.be> wrote:

> Hi all,
> 
>> As mentioned before my goal is to see if we can have some common things
>> in the various JS libraries, especially parsers and serializers. For
>> example it doesn't make much sense if we have 4 different SPARQL parsers
>> for the various stores out there. There is one in rdfstore-js by
>> Antonio, one in the SPIN library by Austin, triplestoreJS might be happy
>> about it too and Matteo could use one for LevelGraph.
> 
> TL;DR: Different non-functional requirements lead to different libraries,
> perhaps we should catalog them based on their strengths and weaknesses.
> 
> Many modules that solve the same thing is unfortunately a common phenomenon
> in the current node.js ecosystem (see e.g., [1] and a great blog post which I forgot).
> The high number of existing JavaScript programmers when Node.js was launched
> has lead to an explosion of npm packages. There’s little reuse:
> everybody wanted their own package to solve things in a slightly different way.
> (Heck, I even did the same with my Turtle parser.)
> 
> That said, I think it does make sense to have *some* different parsers (perhaps not 4).
> In addition to function to functional requirements, non-functional requirements influence a library as well.
> For instance, let’s talk about Turtle parsers.
> For my own node-n3 library, I had two main goals: asynchronicity and high performance (in that order).
> What this means to me is that, if I had to make design decisions, those two determined my choice.
> 
> One could say: yes, but everybody wants the most high-performance library.
> Well, not always. For example, node-n3 is not running at maximum possible speed:
> if I'd drop the asynchronous requirement, then I can eliminate a lot of callbacks and speed up everything.
> This means that more performance is possible… for files up to around 2GB. All else would fail.
> However, that’s a crucial design decision you can’t turn on or off with a flag.
> I wanted to go beyond 2GB, so I chose asynchronicity over performance.
> 
> On the other hand, I needed to make compromises to achieve this performance.
> For instance, as runtime classes heavily speed up JavaScript code [2],
> I decided that all triples would have the same rigid structure:
> { subject: <String>, predicate: <String>, object: <String>, context: <String> }
> This means I can’t rely on classes or hidden properties to determine whether the object is a literal or a URI,
> so the differentiation needs to happen inside the object string itself.
> Other parsers might output JSON-LD, which is going to be much slower,
> but way more handy to deal with in the application. If that’s your goal, then such a parser is better.
> However, that’s again a crucial design decision.
> 
> A final example: implementing an RDF parser as a Node steam would have benefits as well.
> It’s handy because you can avoid backpressure, but it comes at a terrible performance cost [3].
> 
> 
> So while it is a good idea to look for common grounds,
> I suggest starting by making an overview of the different libraries.
> This helps people decide which library they need for a specific purpose.
> For instance:
> - What does the library support (and with specs does it pass)?
> - What are its strengths and weaknesses?
> - What are typical use cases?
> To avoid too much marketing language, I propose standardized tests.
> Performance is fairly easy, but not the only metric, as explained above.
> Circumstances matter: parsing small and larges files from disk, memory, or network?
> Parsing straight to a triple-oriented model or directly to JSON-LD?
> Those things have to be decided to have objective measurements.
> 
> No size fits all—but we don’t need more diversity than we can handle.
> And the diversity we have should be documented.
> 
> Best,
> 
> Ruben
> 
> [1] https://medium.com/on-coding/6b6402216740
> [2] https://developers.google.com/v8/design#prop_access
> [3] https://github.com/RubenVerborgh/node-n3/issues/6#issuecomment-24010652

Social Web Architect
http://bblfish.net/
Received on Friday, 11 October 2013 06:24:38 UTC