Re: Semantics of Qurtle (N3 vs TriG), Graph Literals again.

On Fri, 2011-03-04 at 09:55 +0100, Ivan Herman wrote:
> On Mar 4, 2011, at 03:41 , Sandro Hawke wrote:
> 
> > (Aside: let's keep using the name Qurtle, on a *temporary* basis, to
> > refer to our deliverable of a Turtle-like language with "support for
> > multiple graphs and graph stores".  
> 
> I am not sure how does one pronounce "Qurtle" :-)

I think Lee might have most truly and completely addressed this
question, if Peter's answer wasn't enough.  :-)

> > I don't like the name long-term,
> > but it's fine for now.  This post is orthogonal to whether Qurtle is
> > minimal functionality n-quads or maximal functionality
> > Superturtle/TriG++, so I want a neutral name.)
> > 
> > There have been several posts about how it's not clear what the
> > fourth element means.  I want to point out that N3 has an interesting
> > take on the problem; rather than decide and declare a priori the
> > relation between the triples and the extra URI, it lets the author
> > decide and tell the reader via an RDF predicate (examples below).
> > 
> > So, here's a TriG document D:
> > 
> >    @base <http://example.com/> .
> > 
> >    <u1> = { <a> <b> <c> . }
> >    <u2> = { <a> <b> <c>.  <b> <b> <c>. }
> 
> I think the TriG syntax does not use the '=' sign. 

Honestly, I was trying to gloss over this. I see different versions of
TriG doing different things, some in the name of N3 compatibility.   In
some cases [1], TriG uses "=", which suggests it means owl:sameAs, but
we've heard that meaning disclaimed.   In other cases [2], I see TriG
use ":-", which in N3 means the URL label actually gets applied to the
literal.  You can't put URI labels on literals in RDF, so I think it's
best to skip that.

[1] http://www.w3.org/2010/01/Turtle/Trig#sec-grammar-grammar
[2]
http://www4.wiwiss.fu-berlin.de/bizer/TriG/Spec/TriG-20070730/#example
(text before example 2)

> > 
> > I think there are two main schools of thought about what this means,
> > corresponding to whether we think u1 and u2 identify g-snaps or
> > g-boxes.
> > 
> > Option 1 - We might take u1 and u2 as identifying g-snaps.  In this
> > case, D is telling us that the URI "http://example.com/u1" is an
> > identifer for a particular g-snap (abstract/mathematic set of one
> > triple), which we can write down using this turtle g-text, "@base
> > <http://example.com/> .  <a> <b> <c> ."  Similarly, it tells us
> > "http://example.com/u2" identifies a g-snap of two triples.
> > 
> > In n3 (as I understand it; I don't think this part is formally
> > specified), we could write this meaning like this:
> > 
> >    @base <http://example.com/> .
> >    @prefix owl: 
> > 
> >    <u1> owl:sameAs { <a> <b> <c> . }
> >    <u2> owl:sameAs { <a> <b> <c>.  <b> <b> <c>. }
> 
> To be honest, I am not sure I understand this. Using sameAs would mean that the the '{...}' syntax is an RDF concept/resource that has a valid place in a triple. Ie, it is either a syntactic sugar for a literal (ehem, opening up the graph literal issue...) or a resource with some sort of a URI... Or we have to have a new RDF concept for a g-snap that can be used as a legitimate part of an RDF triple.

In N3 it's the last option: g-snaps (aka "formulae") are legitimate
elements of triples.   I wasn't proposing we think about it like that, I
was just showing how N3 uses an RDF property to explicitly relate the
named node with the g-snap. 

> > 
> > Option 2 - We might take u1 and u2 as identifying g-boxes.  In this
> > case, D is telling us that "http://example.com/u1" identifies a
> > container of triples which currently contains one triple, as shown.
> 
> Does it say that <u1> contains _exactly_ that triple or that it does contain that triple by may contain more?

What I meant by "Option 2" was "exactly".   

There might be an Option 3, where, as you suggest, we read D to be
asserting that u1 contains at least that one triple and u2 contains at
least those two triples.     That Option 3 would be written in N3 using
both log:semantics and log:includes:

    @base <http://example.com/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#>.
 
    <u1> log:semantics _:u1_snap.
    _:u1_snap log:includes { <a> <b> <c> . };

    <u2> log:semantics _:u2_snap.
    _:u2_snap log:includes { <a> <b> <c>.  <b> <b> <c>. };

We might define a property chain of log:semantic and log:includes,
called log:semanticsIncludes, and then we could write:


    <u1> log:semanticsIncludes { <a> <b> <c> . };
    <u2> log:semanticsIncludes { <a> <b> <c>.  <b> <b> <c>. };


That's a purely declarative statement.   It has the advantage that you
can load D, D2, D3, ... and the results all merge.  This allows you to
store some useful information without needing the full graphs.   It has
the disadvantage that you don't know what's missing from the those
g-boxes, and you sometimes need that.  So you'd still have to
dereference u1 to give complete answers to queries.

I can also imagine that some people use Qurtle as a kind of shorthand
for some SPARQL:

   INSERT { GRAPH <u1> { <a> <b> <c> . }
            GRAPH <u1> { <a> <b> <c>.  <b> <b> <c>. } }

I suggest we discourage that reading and keep Qurtle declarative.
 
> > We could reasonably expect that, barring things changing, we could do
> > a GET on "http://example.com/u1" and get back the Turtle content,
> > "@base <http://example.com/> .  <a> <b> <c> ."  If we got D from a
> > trusted source, and for one reason or another we're not worried about
> > things changing, we could skip doing that GET, because we know the
> > result already.
> > 
> > In n3 (again, as I understand it), we could write this meaning as:
> > 
> >    @base <http://example.com/> .
> >    @prefix owl: <http://www.w3.org/2002/07/owl#>.
> > 
> >    <u1> log:semantics { <a> <b> <c> . }
> >    <u2> log:semantics { <a> <b> <c>.  <b> <b> <c>. }
> > 
> > ("The log:semantics of a document is the formula which one gets by
> > parsing a [it]." [1] For "formula" read "graph", for our purposes.)
> 
> And I think my comment above applies again
> 
> > 
> > Are there other common meanings?  There are other relationships that
> > resources can have with triples, of course:
> > 
> >  - a person can assert/claim some g-snap
> >  - a person can be the author/creator of some g-snap
> >  - n-ary: a person can assert some g-snap over some time range
> >  - ... etc
> > 
> > but all of these can be done using the Option-1 (g-snap) or Option-2
> > (g-box) interpretations, like this:
> > 
> >    my:Sandro eg:claims <u1> .
> > 
> > That would be defined to means either that I claim the g-snap u1 or
> > that I claim whatever is in the g-box u1, depending on which solution
> > we are using.
> > 
> > So, I don't know that it matter very much which way we go.  In my own
> > coding, in part because I'm usually using a mutable quad store, I
> > think of it as Option-2 (g-boxes), BUT I only use my own URI space (so
> > it never changes without me knowing about it), and there's usually a
> > set of URIs which I treat as immutable and think of as effectively
> > being g-snap identifiers.  When I fetch stuff off the web, I store
> > that explicitly, keeping each version as long as necessary, with its
> > own URI.
> > 
> > I will note -- returning to a topic of some earlier emails -- that some
> > of the use case for Qurtle can be addressed by just defining datatypes
> > for the RDF syntaxes.  For example, we can write D in ordinary Turtle,
> > with Option-1 semantics, like this:
> > 
> >    @base <http://example.com/> .
> >    @prefix owl: <http://www.w3.org/2002/07/owl#>.
> >    @prefix rdfsyn: <http:://example.org/rdf-syntaxes/> .
> > 
> >    <u1> owl:sameAs "@base <http://example.com/> . { <a> <b> <c> . }"^^rdfsyn:turtle
> >    <u2> owl:sameAs "@base <http://example.com/> . { <a> <b> <c>.  <b> <b> <c>. }"^^rdfsyn:turtle
> > 
> 
> So these are graph literals after all. And one can define
> 
> { <a> <b> <c> . } 
> 
> as being a syntactic shorthand for the 
> 
> "@base <http://example.com/> . { <a> <b> <c> . }"^^rdfsyn:turtle
> 
> literal, just as 
> 
> 123.45
> 
> is a shorthand for
> 
> "123.45"^^xsd:float

Yes.  That might be a very nice approach to take in the spec, keeping
things really simple and separate.     

It would make Qurtle just be Turtle with extra syntactic sugar.  It
would not need to say anything about semantics of quads or whatever --
that would be left to the definitions of those datatypes.

The only shortcoming I see to this approach is that it does not allow
sharing of bnodes.  I'm not yet sure how much I care about that.

It's slightly less obvious, but you could use this syntactic-sugar
reading on n-quads as well, converting n-quads to n-triples by gathering
up the triples with the same 4th element and putting them in a graph
literal.   As with TriG, someone would need to specify which
relationship you meant in the n-quads, owl:sameAs,
log:semantics/rdf:content, semanticsIncludes, or whatever.

> > The quoting gets a little hairy to do by hand, both in Turtle and
> > RDF/XML, but it's pretty easy for machines.  No special parser is
> > needed, and systems which don't know this datatype will, I think,
> > effectively ignore the triples, as they probably should.  If we want
> > option-2 semantics, I think we'd need to make up a new predicate, like
> > rdf:content or something.
> 
> you mean having both, right? 
> 
> <G> { .... }
> 
> would mean option 1 and an extra syntax would give option 2

I don't yet have the data to know if we want option 1, option 2, both,
or even including option 3.

My strawman right now would be to say for now you have to use a
predicate, owl:sameAs (maybe written "=", for TriG compatibility and to
assuage people scared of OWL), rdf:content, etc.  Later, if it becomes
clear which of those predicates everyone wants, then we can add the
syntactic sugar of allowing you to leave out that predicate.

To summarize, at this point I'm suggesting that a SPARQL dataset would
be dumped as Qurtle like:

    @prefix ...
    @base ...
    <slot1> rdf:content { triples in slot1 }.
    <slot2> rdf:content { triples in slot2 }.
    ...
    <def> rdf:content { triples in default graph }.
    <> sparql:defaultGraph <def>

or the equivalent Turtle:


    @prefix ...
    @base ...
    <slot1> rdf:content " triples in slot1 "^^rdfsyn:turtle .
    <slot2> rdf:content " triples in slot2 "^^rdfsyn:turtle .
    ...
    <def> rdf:content " triples in default graph "^^rdfsyn:turtle .
    <> sparql:defaultGraph <def>

and I'll leave the RDFa and RDF/XML versions as exercises for the
reader.  :-)

For Steve, and others who are concerned about their systems getting
quads when they only wanted triples.... I think there's a technical
solution here, that maybe this design makes clear.  The concern, I
think, is about getting metadata or system data when you wanted only
application data.  My sense it that can be addressed like any other kind
of bad RDF data, by ignoring or rejecting it.  Am I missing something
there?

> > Where this falls short, I think, is in ease-of-hand-authoring and in
> > not allowing bnodes to be shared between the graphs.  But a lot of
> > people don't want that anyway and may be happy to discourage it like
> > this.  Also, it's not as easy to process as n-quads, especially
> > for massive dumps, and some mechanism would need to be introduced for
> > signaling the default graph.  (Something like "<> eg:defaultGraph
> > <g1>.")
> > 
> > (Note re [2], Ivan, these are literals just like xs:integer, and don't
> > open up any new issues.  There's no more need for them to be subjects
> > than for integers to be subjects.  The value space is g-snaps, the
> > lexical space for the turtle one is the set of turtle g-texts, etc.)
> 
> As long as the syntax does not _require_ a literal as subject, I have no problem talking about graph literals. We just have to be very clear in our minds that graph literals as subject cannot be looked at in isolation, but only with relation to the much more general issue of literals as subjects in general...

Then it looks like we're okay on that front.

    -- Sandro

> Ivan
> >     -- Sandro
> > 
> > [1] http://www.w3.org/2000/10/swap/doc/Reach.html
> > [2] http://lists.w3.org/Archives/Public/public-rdf-wg/2011Feb/0127
> > 
> > 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 

Received on Friday, 4 March 2011 15:50:15 UTC