Re: representing URIs and literals from Ruben Verborgh on 2013-11-04 (public-rdfjs@w3.org from November 2013)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Mon, 4 Nov 2013 07:24:39 +0000
To: Austin William Wright <aaa@bzfx.net>
Cc: "public-rdfjs@w3.org" <public-rdfjs@w3.org>
Message-Id: <C8955821-07DB-4428-BED2-401C6C6DEF45@ugent.be>

Hi Austin,

> That's seems to be an argument in favor of using instances, instead of primitive types or plain Object maps which don't get the benefit from this V8 compile-time logic.

There should be no difference between instances and Object maps, right?
Everything that has the same structure will be treated the same.

> Yet this is what I do with 'rdf'. Try it yourself:
> 
> > require('rdf').setBuiltins();
> > var a = 4;
> > a.datatype
> 'http://www.w3.org/2001/XMLSchema#integer'
> > a.nodeType()
> 'TypedLiteral'
> > (4.5).datatype
> 'http://www.w3.org/2001/XMLSchema#decimal'
> > "_:x".nodeType()
> ‘BlankNode’

Ah, you set it on the prototype and not on the instance.
But then how do you deal with:
- "40"^^xsd:float
- "40"^^xsd:decimal
- "40"^^xsd:double

> The times you list are cumulative

Oops yes… that’s not a good thing. Thanks for fixing!

> Pre-parsing depends on your operation. Your tests are greatly biased towards creating objects rather than searching for them - by over five orders of magnitude!

Oops again, my mistake because of the cumulative times.

> When indexing my graphs, my search performs three orders of magnitude faster than yours.

Mind you, it’s a very naive search indeed, meant to test the performance of specific operations rather than an algorithm.

> Please note for these tests I made many changes, most significantly I changed to the Node.js-specific `process.hrtime` for benchmarking, the timer reports for each test individually, I replace your 'prototype' examples with my own 'rdf' library (which is better optimized), I load more realistic data (I select among ~100 predicates using a log distribution), and I perform slightly more reads (though inserts still outnumber reads by 10k times, which is completely unrealistic, but good enough to get the idea for our purposes).

Thanks very much for the hard work, I appreciate it  and I will incorporate the changes.
However, I’ll probably add the ‘rdf’ library as a third option instead of replacing the naive prototype, just so we can compare.

> You appear to be interested in testing just the raw performance of one construct versus another, and as such use straight loops that run O(n). But I don't think it's fair to write a case that's going to be a tiny fraction of runtime once we begin to implement indexes and other computation structures.

Absolutely. The really good benchmarks are applications; this one is very artificial just to test specific things.

> And compounded by the fact that there's no one engine that we're supposed to be using, and even within e.g. V8, it varies from release to release.

Yes, but the hidden class mechanism is used by many engines.

> But I'm slightly confused, your post was about making appropriate design decisions, now we've discussed what's more efficient. Well, what are we looking for?

Design decisions and how they influence RDF libraries in different facets (including but not limited to performance).

> And please pardon me reducing everything down to JavaScript 101, it's not just experts reading this list ;)

That’s great, discussions like this will be on the Web to help others, and the more clear thing are, the more people benefit from it.

Cheers,

Ruben

Received on Monday, 4 November 2013 07:25:14 UTC