Re: representing URIs and literals from Austin William Wright on 2013-11-06 (public-rdfjs@w3.org from November 2013)

From: Austin William Wright <aaa@bzfx.net>
Date: Tue, 5 Nov 2013 21:12:27 -0700
To: Ruben Verborgh <ruben.verborgh@ugent.be>
Cc: "public-rdfjs@w3.org" <public-rdfjs@w3.org>
Message-ID: <CANkuk-U3hO2hXV7zYWYMHRLSy1KO3wmHUk8mOFAd9ZfCs+5yJA@mail.gmail.com>
On Mon, Nov 4, 2013 at 12:24 AM, Ruben Verborgh <ruben.verborgh@ugent.be>wrote:

> Hi Austin,
>
> > That's seems to be an argument in favor of using instances, instead of
> primitive types or plain Object maps which don't get the benefit from this
> V8 compile-time logic.
>
> There should be no difference between instances and Object maps, right?
> Everything that has the same structure will be treated the same.
>

In benchmarking most versions of V8, objects created with `new`, as opposed
to a key/value map with fixed keys, are created faster. I don't know about
strings or other primitives, though. Your tests suggest: strings >
instances (with specified keys) > maps (with arbitrary keys).


>
> > Yet this is what I do with 'rdf'. Try it yourself:
> >
> > > require('rdf').setBuiltins();
> > > var a = 4;
> > > a.datatype
> > 'http://www.w3.org/2001/XMLSchema#integer'
> > > a.nodeType()
> > 'TypedLiteral'
> > > (4.5).datatype
> > 'http://www.w3.org/2001/XMLSchema#decimal'
> > > "_:x".nodeType()
> > ‘BlankNode’
>
> Ah, you set it on the prototype and not on the instance.
> But then how do you deal with:
> - "40"^^xsd:float
> - "40"^^xsd:decimal
> - "40"^^xsd:double
>

Natively? That's a good question. At the very least, you can go:

env.createLiteral(40, null, 'xsd:double');

The "builtins" that I describe are really used for creating triples with a
type that just works, not reading existing ones and preserving the
datatype. You could add a property on top of Number, String, Date, etc, as
I described earlier.

The best option, though, is store as a Literal, and then RDF Interfaces has
a sort of funny syntax to a native datatype.


>
> Thanks very much for the hard work, I appreciate it  and I will
> incorporate the changes.
> However, I’ll probably add the ‘rdf’ library as a third option instead of
> replacing the naive prototype, just so we can compare.
>

That's a good idea. I'd suggest writing a few different classes of tests,
and implementing them different ways. The tests should be separate fields,
maybe like '1a', '1b', etc, describing the example to be written and the
implementation, respectively.


>
> > You appear to be interested in testing just the raw performance of one
> construct versus another, and as such use straight loops that run O(n). But
> I don't think it's fair to write a case that's going to be a tiny fraction
> of runtime once we begin to implement indexes and other computation
> structures.
>
> Absolutely. The really good benchmarks are applications; this one is very
> artificial just to test specific things.
>
> > And compounded by the fact that there's no one engine that we're
> supposed to be using, and even within e.g. V8, it varies from release to
> release.
>
> Yes, but the hidden class mechanism is used by many engines.
>
> > But I'm slightly confused, your post was about making appropriate design
> decisions, now we've discussed what's more efficient. Well, what are we
> looking for?
>
> Design decisions and how they influence RDF libraries in different facets
> (including but not limited to performance).
>

Well, how valuable should performance be? If it was of the utmost priority,
we could just use ASM.

And even if there is a small performance hit for functionally identical
code, it's not always the case that technically faster code is easier to
use while being fast. Consider streams in C++, it adopted/shoehorned a
bizarre syntax and became far slower than the nicer (imo) syntax of C.

That is, there's performance improvements to be gained just by making it
easy to write performant code. But you can't really measure that, or to the
extent you can, it's easier to measure in dollars than CPU cycles.

I probably am to end up doing something esoteric to enable fast inserts in
my triple index, since ECMAScript doesn't have decent builtin data
structures. But the point is, this won't be exposed to the user, the goal
of a library should be to isolate and expose functionality.

Austin.
Received on Wednesday, 6 November 2013 04:12:54 UTC