- From: Austin William Wright <aaa@bzfx.net>
- Date: Sun, 3 Nov 2013 00:28:25 -0700
- To: Ruben Verborgh <ruben.verborgh@ugent.be>
- Cc: "public-rdfjs@w3.org" <public-rdfjs@w3.org>
- Message-ID: <CANkuk-UPuTRS5TOY6XiC3K3r01cE0bQU+vPCP6t7dgNmWTdo6w@mail.gmail.com>
The fact that JavaScript gives us more possibilities is what led Nathan (webr3) to take on the idea of using JavaScript/ECMAScript for manipulating RDF, which I believe eventually led to RDF Interfaces < http://www.w3.org/TR/rdf-interfaces/>. I followed up on this concept for Node.js with the library I'm maintaining, the 'rdf' package at < https://github.com/Acubed/node-rdf>, largely forked from his code. In my package and in RDF Interfaces, we use the term 'node', as in some fundamental unit of data, and part of an RDF Statement. RDF Interfaces uses the term 'Triple', I implement this vocabulary, but prefer the term Statement, since RDF uses the term Statement. What you're describing, and what node-n3 appears to do, Java could do as well: Just look at the string and switch behavior depending on the detected input. This is quintessential object-oriented programming, it's polymorphism. I'm not sure how this is faster. It should be slower, you have to actually perform string operations on the string to determine its type. This is far slower than builtin type/class polymorphism! ECMAScript is distinctly *not* object-oriented, if the definition of OO includes polymorphic (most references I see define it this way, I'll adopt this usage here since it helps us distinguish things better). What ECMAScript has instead is prototypes, including for built-in primitives! What we tend to call 'classes' in ECMAScript aren't really classes, that's just for lack of a better term: A 'class' is an object (anything that's not a primitive type, including Functions) intended to be a prototype of another object, called the instance. The 'classes' we have are prototypical and for this reason, distinctly more powerful than 'classes' in Java. RDF Interfaces, the 'rdf' package and module, and webr3's work take this route. What this means is instead of encoding literals as an ECMAScript value such as any of [ 'uri' , '"string"' , '"string"^<datatype>' ], we can encode some literals using native datatypes: (50).type === 'http://www.w3.org/2001/XMLSchema#integer' (50).toString() === '"50"^^<http://www.w3.org/2001/XMLSchema#integer>' This is far more powerful than merely encoding all literals as a string. We can perform operations directly on RDF nodes, and preserve their RDF semantics! You can't do this with strings. I'm unaware of any problems that using objects has that strings don't. Remember, in ECMAScript, there's no polymorphism, just objects with a prototype chain, so there shouldn't be any difference between passing a string and passing an object per se, except that a string removes your ability to use a prototype chain. And when including other effects (not per se), strings appear to fare worse: As I pointed out, you have to parse them, but objects are pre-parsed in memory. This brings up the curious task of handling URIs and bnodes. We could define URIs/IRIs as a class. This is certainly a good idea if we want to operate on the IRI, like extract the path component. But this is only relevant if you're a server who minted the URI, or a User Agent who needs to deference a URL (a network-addressable URI). Otherwise, URIs are opaque, they carry no meaning, they only differentiate resources from one another with a single, universal name. Additionally, there is no confusion between when we use a string to represent a node that's a URI. Further, RDF technically actually uses IRIs, which are Unicode instead of 7-bit character strings, and ECMAScript Strings are UTF-16 strings. (There's a largely isomorphic mapping between URIs and IRIs so the distinction isn't typically meaningful). So for this reason, I use ECMAScript Strings instead of an object. There's the other concern of bnodes. Bnodes are not URIs/IRIs, they are anonymous identifiers used for subgraph matching. And bnodes are not permitted in the predicate, only in the subject or object (due to their subgraph matching nature). Bnodes often take the form of "_:token", as if there was a "_" prefix in Turtle, but this is completely arbitrary and how they're displayed is completely up to the mechanism serializing the graph, so long as it preserves the notion of which bnodes are the same as each other. Bnodes with the syntax of "_:token" will never be confused with IRIs ("_" isn't in the `scheme` production for URIs or IRIs, and the next character is ":", and I only accept URIs, not URI References). Though intuitively tempting, it doesn't make sense to encode literals as just strings. Most literals will have types, and in RDF 1.1, all literals have types, untyped strings become the same as `xsd:string`. So no matter what, we'll typically have to encode literals as a (unicode, uri) tuple. This encourages the production of a generic `Literal` node in the form of this tuple. For representing in JSON, using plain strings as URIs, and an {value:"data", type:"uri", lang:"str"} structure for literals ("type" and "lang" being optional and mutually exclusive). There's more options, for instance converting Arrays to a graph of a linked list. Sometimes you do need to convert a node to a string, for instance, for using as a key. You could either use the Node#toString method and convert to Turtle, or utilize a simple format of "uri value", where `uri` is the datatype of the literal or the value of the URI, and `value` is the value of the literal, or label of the bnode (for serialization purposes, and uri is blank, so for bnodes the first character is a space). This latter form is extremely fast to serialize and fast to render, or convert into Turtle. These are all features of the "rdf" package < https://github.com/Acubed/node-rdf>. Austin Wright. On Sat, Nov 2, 2013 at 8:55 AM, Ruben Verborgh <ruben.verborgh@ugent.be>wrote: > Hi all, > > A major design decision for an RDF library is how to represent URIs and > literals. > > For typed languages such as Java, the choice is pretty obvious: > a URI class and a Literal class, which both inherit from a common parent > class. > A triple then has a constructor like Triple(URI subject, URI predicate, > Entity object). > Unfortunately, this can lead to quite cumbersome code. Creating a triple > is as awful as: > new Triple(new URI("http://example.org/a"), new URI(" > http://example.org/b"), new Literal("c")) > The fact that only objects can be literals, could help to obtain a more > compact overloaded constructor: > new Triple("http://example.org/a", "http://example.org/b", new > Literal("c")) > However, the verbosity for the literal still remains, and accessing > properties always involves indirection: > String value = ((Literal)triple.getObject()).getValue(); > Languages such as C# can do some automated type conversion, but this does > not always help. > > Being a dynamic language, JavaScript gives us more possibilities. > We could follow the Java road and implement it with classes, but then we > gain little. > This code is the slowest to write and execute (because of different > runtime classes). > > Alternatively, the JSON-LD uses annotations to indicate what is a URI and > what is a literal [1]. > This code fast to write and execute. > The major difference is that JSON-LD does not represent RDF on the triple > level, but rather as a specific JSON tree. > > A third option is what I have chosen in node-n3: URIs are regular strings; > literals are double-quoted strings [2]. > This code is fast to write and execute (all runtime triple classes are the > same). > URI comparisons and literal comparisons are transparent; an extra step is > required to get the literal value though. > > There are possibly more options, and it could be interesting to see which > library has chosen what and why. > > Best, > > Ruben > > [1] http://json-ld.org/spec/latest/json-ld/#h3_the-context > [2] > https://github.com/RubenVerborgh/node-n3#representing-uris-and-literals >
Received on Sunday, 3 November 2013 07:28:53 UTC