Semantics of Qurtle (N3 vs TriG), Graph Literals again.

(Aside: let's keep using the name Qurtle, on a *temporary* basis, to
refer to our deliverable of a Turtle-like language with "support for
multiple graphs and graph stores".  I don't like the name long-term,
but it's fine for now.  This post is orthogonal to whether Qurtle is
minimal functionality n-quads or maximal functionality
Superturtle/TriG++, so I want a neutral name.)

There have been several posts about how it's not clear what the
fourth element means.  I want to point out that N3 has an interesting
take on the problem; rather than decide and declare a priori the
relation between the triples and the extra URI, it lets the author
decide and tell the reader via an RDF predicate (examples below).

So, here's a TriG document D:

    @base <> .

    <u1> = { <a> <b> <c> . }
    <u2> = { <a> <b> <c>.  <b> <b> <c>. }

I think there are two main schools of thought about what this means,
corresponding to whether we think u1 and u2 identify g-snaps or

Option 1 - We might take u1 and u2 as identifying g-snaps.  In this
case, D is telling us that the URI "" is an
identifer for a particular g-snap (abstract/mathematic set of one
triple), which we can write down using this turtle g-text, "@base
<> .  <a> <b> <c> ."  Similarly, it tells us
"" identifies a g-snap of two triples.

In n3 (as I understand it; I don't think this part is formally
specified), we could write this meaning like this:

    @base <> .
    @prefix owl: 

    <u1> owl:sameAs { <a> <b> <c> . }
    <u2> owl:sameAs { <a> <b> <c>.  <b> <b> <c>. }

Option 2 - We might take u1 and u2 as identifying g-boxes.  In this
case, D is telling us that "" identifies a
container of triples which currently contains one triple, as shown.
We could reasonably expect that, barring things changing, we could do
a GET on "" and get back the Turtle content,
"@base <> .  <a> <b> <c> ."  If we got D from a
trusted source, and for one reason or another we're not worried about
things changing, we could skip doing that GET, because we know the
result already.

In n3 (again, as I understand it), we could write this meaning as:

    @base <> .
    @prefix owl: <>.

    <u1> log:semantics { <a> <b> <c> . }
    <u2> log:semantics { <a> <b> <c>.  <b> <b> <c>. }

("The log:semantics of a document is the formula which one gets by
parsing a [it]." [1] For "formula" read "graph", for our purposes.)

Are there other common meanings?  There are other relationships that
resources can have with triples, of course:

  - a person can assert/claim some g-snap
  - a person can be the author/creator of some g-snap
  - n-ary: a person can assert some g-snap over some time range
  - ... etc

but all of these can be done using the Option-1 (g-snap) or Option-2
(g-box) interpretations, like this:

    my:Sandro eg:claims <u1> .

That would be defined to means either that I claim the g-snap u1 or
that I claim whatever is in the g-box u1, depending on which solution
we are using.

So, I don't know that it matter very much which way we go.  In my own
coding, in part because I'm usually using a mutable quad store, I
think of it as Option-2 (g-boxes), BUT I only use my own URI space (so
it never changes without me knowing about it), and there's usually a
set of URIs which I treat as immutable and think of as effectively
being g-snap identifiers.  When I fetch stuff off the web, I store
that explicitly, keeping each version as long as necessary, with its
own URI.

I will note -- returning to a topic of some earlier emails -- that some
of the use case for Qurtle can be addressed by just defining datatypes
for the RDF syntaxes.  For example, we can write D in ordinary Turtle,
with Option-1 semantics, like this:

    @base <> .
    @prefix owl: <>.
    @prefix rdfsyn: <http:://> .

    <u1> owl:sameAs "@base <> . { <a> <b> <c> . }"^^rdfsyn:turtle
    <u2> owl:sameAs "@base <> . { <a> <b> <c>.  <b> <b> <c>. }"^^rdfsyn:turtle

The quoting gets a little hairy to do by hand, both in Turtle and
RDF/XML, but it's pretty easy for machines.  No special parser is
needed, and systems which don't know this datatype will, I think,
effectively ignore the triples, as they probably should.  If we want
option-2 semantics, I think we'd need to make up a new predicate, like
rdf:content or something.

Where this falls short, I think, is in ease-of-hand-authoring and in
not allowing bnodes to be shared between the graphs.  But a lot of
people don't want that anyway and may be happy to discourage it like
this.  Also, it's not as easy to process as n-quads, especially
for massive dumps, and some mechanism would need to be introduced for
signaling the default graph.  (Something like "<> eg:defaultGraph

(Note re [2], Ivan, these are literals just like xs:integer, and don't
open up any new issues.  There's no more need for them to be subjects
than for integers to be subjects.  The value space is g-snaps, the
lexical space for the turtle one is the set of turtle g-texts, etc.)

     -- Sandro


Received on Friday, 4 March 2011 02:41:47 UTC