- From: Henry Story <henry.story@bblfish.net>
- Date: Fri, 11 Oct 2013 08:24:05 +0200
- To: Ruben Verborgh <ruben.verborgh@ugent.be>
- Cc: public-rdfjs@w3.org, Adrian Gschwend <ktk@netlabs.org>
+1. Thanks for this excellent reply. I find the same thing looking at Java parsers. I wrote an asynchronous parser a year ago, only to find it was 5 times slower than the standard parsers. That was too much for me at the time, and was probably due to the very alpha state of the parser library I was using. But I would certainly have been happy to pay something for asynchronicity. On 10 Oct 2013, at 12:04, Ruben Verborgh <ruben.verborgh@ugent.be> wrote: > Hi all, > >> As mentioned before my goal is to see if we can have some common things >> in the various JS libraries, especially parsers and serializers. For >> example it doesn't make much sense if we have 4 different SPARQL parsers >> for the various stores out there. There is one in rdfstore-js by >> Antonio, one in the SPIN library by Austin, triplestoreJS might be happy >> about it too and Matteo could use one for LevelGraph. > > TL;DR: Different non-functional requirements lead to different libraries, > perhaps we should catalog them based on their strengths and weaknesses. > > Many modules that solve the same thing is unfortunately a common phenomenon > in the current node.js ecosystem (see e.g., [1] and a great blog post which I forgot). > The high number of existing JavaScript programmers when Node.js was launched > has lead to an explosion of npm packages. There’s little reuse: > everybody wanted their own package to solve things in a slightly different way. > (Heck, I even did the same with my Turtle parser.) > > That said, I think it does make sense to have *some* different parsers (perhaps not 4). > In addition to function to functional requirements, non-functional requirements influence a library as well. > For instance, let’s talk about Turtle parsers. > For my own node-n3 library, I had two main goals: asynchronicity and high performance (in that order). > What this means to me is that, if I had to make design decisions, those two determined my choice. > > One could say: yes, but everybody wants the most high-performance library. > Well, not always. For example, node-n3 is not running at maximum possible speed: > if I'd drop the asynchronous requirement, then I can eliminate a lot of callbacks and speed up everything. > This means that more performance is possible… for files up to around 2GB. All else would fail. > However, that’s a crucial design decision you can’t turn on or off with a flag. > I wanted to go beyond 2GB, so I chose asynchronicity over performance. > > On the other hand, I needed to make compromises to achieve this performance. > For instance, as runtime classes heavily speed up JavaScript code [2], > I decided that all triples would have the same rigid structure: > { subject: <String>, predicate: <String>, object: <String>, context: <String> } > This means I can’t rely on classes or hidden properties to determine whether the object is a literal or a URI, > so the differentiation needs to happen inside the object string itself. > Other parsers might output JSON-LD, which is going to be much slower, > but way more handy to deal with in the application. If that’s your goal, then such a parser is better. > However, that’s again a crucial design decision. > > A final example: implementing an RDF parser as a Node steam would have benefits as well. > It’s handy because you can avoid backpressure, but it comes at a terrible performance cost [3]. > > > So while it is a good idea to look for common grounds, > I suggest starting by making an overview of the different libraries. > This helps people decide which library they need for a specific purpose. > For instance: > - What does the library support (and with specs does it pass)? > - What are its strengths and weaknesses? > - What are typical use cases? > To avoid too much marketing language, I propose standardized tests. > Performance is fairly easy, but not the only metric, as explained above. > Circumstances matter: parsing small and larges files from disk, memory, or network? > Parsing straight to a triple-oriented model or directly to JSON-LD? > Those things have to be decided to have objective measurements. > > No size fits all—but we don’t need more diversity than we can handle. > And the diversity we have should be documented. > > Best, > > Ruben > > [1] https://medium.com/on-coding/6b6402216740 > [2] https://developers.google.com/v8/design#prop_access > [3] https://github.com/RubenVerborgh/node-n3/issues/6#issuecomment-24010652 Social Web Architect http://bblfish.net/
Received on Friday, 11 October 2013 06:24:38 UTC