Re: representing URIs and literals from Austin William Wright on 2013-11-03 (public-rdfjs@w3.org from November 2013)

From: Austin William Wright <aaa@bzfx.net>
Date: Sun, 3 Nov 2013 00:28:25 -0700
To: Ruben Verborgh <ruben.verborgh@ugent.be>
Cc: "public-rdfjs@w3.org" <public-rdfjs@w3.org>
Message-ID: <CANkuk-UPuTRS5TOY6XiC3K3r01cE0bQU+vPCP6t7dgNmWTdo6w@mail.gmail.com>
The fact that JavaScript gives us more possibilities is what led Nathan
(webr3) to take on the idea of using JavaScript/ECMAScript for manipulating
RDF, which I believe eventually led to RDF Interfaces <
http://www.w3.org/TR/rdf-interfaces/>. I followed up on this concept for
Node.js with the library I'm maintaining, the 'rdf' package at <
https://github.com/Acubed/node-rdf>, largely forked from his code.

In my package and in RDF Interfaces, we use the term 'node', as in some
fundamental unit of data, and part of an RDF Statement. RDF Interfaces uses
the term 'Triple', I implement this vocabulary, but prefer the term
Statement, since RDF uses the term Statement.

What you're describing, and what node-n3 appears to do, Java could do as
well: Just look at the string and switch behavior depending on the detected
input. This is quintessential object-oriented programming, it's
polymorphism. I'm not sure how this is faster. It should be slower, you
have to actually perform string operations on the string to determine its
type. This is far slower than builtin type/class polymorphism!

ECMAScript is distinctly *not* object-oriented, if the definition of OO
includes polymorphic (most references I see define it this way, I'll adopt
this usage here since it helps us distinguish things better). What
ECMAScript has instead is prototypes, including for built-in primitives!
What we tend to call 'classes' in ECMAScript aren't really classes, that's
just for lack of a better term: A 'class' is an object (anything that's not
a primitive type, including Functions) intended to be a prototype of
another object, called the instance. The 'classes' we have are prototypical
and for this reason, distinctly more powerful than 'classes' in Java. RDF
Interfaces, the 'rdf' package and module, and webr3's work take this route.

What this means is instead of encoding literals as an ECMAScript value such
as any of [ 'uri' , '"string"' , '"string"^<datatype>' ], we can encode
some literals using native datatypes:

(50).type === 'http://www.w3.org/2001/XMLSchema#integer'
(50).toString() === '"50"^^<http://www.w3.org/2001/XMLSchema#integer>'

This is far more powerful than merely encoding all literals as a string. We
can perform operations directly on RDF nodes, and preserve their RDF
semantics! You can't do this with strings. I'm unaware of any problems that
using objects has that strings don't. Remember, in ECMAScript, there's no
polymorphism, just objects with a prototype chain, so there shouldn't be
any difference between passing a string and passing an object per se,
except that a string removes your ability to use a prototype chain. And
when including other effects (not per se), strings appear to fare worse: As
I pointed out, you have to parse them, but objects are pre-parsed in memory.

This brings up the curious task of handling URIs and bnodes.

We could define URIs/IRIs as a class. This is certainly a good idea if we
want to operate on the IRI, like extract the path component. But this is
only relevant if you're a server who minted the URI, or a User Agent who
needs to deference a URL (a network-addressable URI). Otherwise, URIs are
opaque, they carry no meaning, they only differentiate resources from one
another with a single, universal name.

Additionally, there is no confusion between when we use a string to
represent a node that's a URI. Further, RDF technically actually uses IRIs,
which are Unicode instead of 7-bit character strings, and ECMAScript
Strings are UTF-16 strings. (There's a largely isomorphic mapping between
URIs and IRIs so the distinction isn't typically meaningful). So for this
reason, I use ECMAScript Strings instead of an object.

There's the other concern of bnodes. Bnodes are not URIs/IRIs, they are
anonymous identifiers used for subgraph matching. And bnodes are not
permitted in the predicate, only in the subject or object (due to their
subgraph matching nature). Bnodes often take the form of "_:token", as if
there was a "_" prefix in Turtle, but this is completely arbitrary and how
they're displayed is completely up to the mechanism serializing the graph,
so long as it preserves the notion of which bnodes are the same as each
other. Bnodes with the syntax of "_:token" will never be confused with IRIs
("_" isn't in the `scheme` production for URIs or IRIs, and the next
character is ":", and I only accept URIs, not URI References).

Though intuitively tempting, it doesn't make sense to encode literals as
just strings. Most literals will have types, and in RDF 1.1, all literals
have types, untyped strings become the same as `xsd:string`. So no matter
what, we'll typically have to encode literals as a (unicode, uri) tuple.
This encourages the production of a generic `Literal` node in the form of
this tuple.

For representing in JSON, using plain strings as URIs, and an
{value:"data", type:"uri", lang:"str"} structure for literals ("type" and
"lang" being optional and mutually exclusive).

There's more options, for instance converting Arrays to a graph of a linked
list.

Sometimes you do need to convert a node to a string, for instance, for
using as a key. You could either use the Node#toString method and convert
to Turtle, or utilize a simple format of "uri value", where `uri` is the
datatype of the literal or the value of the URI, and `value` is the value
of the literal, or label of the bnode (for serialization purposes, and uri
is blank, so for bnodes the first character is a space). This latter form
is extremely fast to serialize and fast to render, or convert into Turtle.

These are all features of the "rdf" package <
https://github.com/Acubed/node-rdf>.

Austin Wright.


On Sat, Nov 2, 2013 at 8:55 AM, Ruben Verborgh <ruben.verborgh@ugent.be>wrote:

> Hi all,
>
> A major design decision for an RDF library is how to represent URIs and
> literals.
>
> For typed languages such as Java, the choice is pretty obvious:
> a URI class and a Literal class, which both inherit from a common parent
> class.
> A triple then has a constructor like Triple(URI subject, URI predicate,
> Entity object).
> Unfortunately, this can lead to quite cumbersome code. Creating a triple
> is as awful as:
>     new Triple(new URI("http://example.org/a"), new URI("
> http://example.org/b"), new Literal("c"))
> The fact that only objects can be literals, could help to obtain a more
> compact overloaded constructor:
>     new Triple("http://example.org/a", "http://example.org/b", new
> Literal("c"))
> However, the verbosity for the literal still remains, and accessing
> properties always involves indirection:
>     String value = ((Literal)triple.getObject()).getValue();
> Languages such as C# can do some automated type conversion, but this does
> not always help.
>
> Being a dynamic language, JavaScript gives us more possibilities.
> We could follow the Java road and implement it with classes, but then we
> gain little.
> This code is the slowest to write and execute (because of different
> runtime classes).
>
> Alternatively, the JSON-LD uses annotations to indicate what is a URI and
> what is a literal [1].
> This code fast to write and execute.
> The major difference is that JSON-LD does not represent RDF on the triple
> level, but rather as a specific JSON tree.
>
> A third option is what I have chosen in node-n3: URIs are regular strings;
> literals are double-quoted strings [2].
> This code is fast to write and execute (all runtime triple classes are the
> same).
> URI comparisons and literal comparisons are transparent; an extra step is
> required to get the literal value though.
>
> There are possibly more options, and it could be interesting to see which
> library has chosen what and why.
>
> Best,
>
> Ruben
>
> [1] http://json-ld.org/spec/latest/json-ld/#h3_the-context
> [2]
> https://github.com/RubenVerborgh/node-n3#representing-uris-and-literals
>
Received on Sunday, 3 November 2013 07:28:53 UTC