On diversity [was: Chrome Extension of Semantic Web]

Hi all,

> As mentioned before my goal is to see if we can have some common things
> in the various JS libraries, especially parsers and serializers. For
> example it doesn't make much sense if we have 4 different SPARQL parsers
> for the various stores out there. There is one in rdfstore-js by
> Antonio, one in the SPIN library by Austin, triplestoreJS might be happy
> about it too and Matteo could use one for LevelGraph.

TL;DR: Different non-functional requirements lead to different libraries,
perhaps we should catalog them based on their strengths and weaknesses.

Many modules that solve the same thing is unfortunately a common phenomenon
in the current node.js ecosystem (see e.g., [1] and a great blog post which I forgot).
The high number of existing JavaScript programmers when Node.js was launched
has lead to an explosion of npm packages. There’s little reuse:
everybody wanted their own package to solve things in a slightly different way.
(Heck, I even did the same with my Turtle parser.)

That said, I think it does make sense to have *some* different parsers (perhaps not 4).
In addition to function to functional requirements, non-functional requirements influence a library as well.
For instance, let’s talk about Turtle parsers.
For my own node-n3 library, I had two main goals: asynchronicity and high performance (in that order).
What this means to me is that, if I had to make design decisions, those two determined my choice.

One could say: yes, but everybody wants the most high-performance library.
Well, not always. For example, node-n3 is not running at maximum possible speed:
if I'd drop the asynchronous requirement, then I can eliminate a lot of callbacks and speed up everything.
This means that more performance is possible… for files up to around 2GB. All else would fail.
However, that’s a crucial design decision you can’t turn on or off with a flag.
I wanted to go beyond 2GB, so I chose asynchronicity over performance.

On the other hand, I needed to make compromises to achieve this performance.
For instance, as runtime classes heavily speed up JavaScript code [2],
I decided that all triples would have the same rigid structure:
{ subject: <String>, predicate: <String>, object: <String>, context: <String> }
This means I can’t rely on classes or hidden properties to determine whether the object is a literal or a URI,
so the differentiation needs to happen inside the object string itself.
Other parsers might output JSON-LD, which is going to be much slower,
but way more handy to deal with in the application. If that’s your goal, then such a parser is better.
However, that’s again a crucial design decision.

A final example: implementing an RDF parser as a Node steam would have benefits as well.
It’s handy because you can avoid backpressure, but it comes at a terrible performance cost [3].


So while it is a good idea to look for common grounds,
I suggest starting by making an overview of the different libraries.
This helps people decide which library they need for a specific purpose.
For instance:
- What does the library support (and with specs does it pass)?
- What are its strengths and weaknesses?
- What are typical use cases?
To avoid too much marketing language, I propose standardized tests.
Performance is fairly easy, but not the only metric, as explained above.
Circumstances matter: parsing small and larges files from disk, memory, or network?
Parsing straight to a triple-oriented model or directly to JSON-LD?
Those things have to be decided to have objective measurements.

No size fits all—but we don’t need more diversity than we can handle.
And the diversity we have should be documented.

Best,

Ruben

[1] https://medium.com/on-coding/6b6402216740
[2] https://developers.google.com/v8/design#prop_access
[3] https://github.com/RubenVerborgh/node-n3/issues/6#issuecomment-24010652

Received on Thursday, 10 October 2013 10:05:23 UTC