RDF Datasets - reflections from Andy Seaborne on 2011-09-30 (public-rdf-wg@w3.org from September 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 30 Sep 2011 11:16:49 +0100
To: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <4E859711.6050406@epimorphics.com>
The area of RDF datasets in the last SPARQL Working group (DAWG) was 
controversial and took a long time.  When writing about what happened, I 
am only saying what I remember of the debates.

The compromise position reached is proably something where everyone 
actively involved had to give up something they considered important. 
There was already a significant amount of prior implementation so people 
had a vested interest in the outcome.  This is not a bad thing.

Earlier in this (RDF) WG, we began talking about he URI being 
"associated" with the graph, and leaving that "association" open.

DanC came up with this text:
[[
The FROM NAMED syntax suggests that the IRI identifies the corresponding 
graph, but the relationship between an IRI and a graph in an RDF dataset 
is indirect. The IRI identifies a resource, and the resource is 
represented by a graph (or, more precisely: by a document that 
serializes a graph). For further details see [WEBARCH].
]]

The Named Graph paper (Carroll, Bizer, Hayes, Stickler), to my reading, 
says that a name refers to the g-snap.  But it then does not provide 
concrete examples of the naming.  All the graphs are ":G1" etc. and does 
not give prefix definitions.

So what makes a good name?

Let <http://www.server.net/resource> be an IR whose representation is a 
serialization of an RDF graph.  It's a g-box.

That's not the g-snap so name that <http://example.org/a_graph> ... but 
that is a location on the web, and being HTTP, you should be able to GET 
from it.  The g-snap isn't "on the web" maybe 
<http://example.org/ns#graph1> is better.

I find that using an non-resolvable URI is better here: I want to name 
the g-snap not put the g-snap on the web:

<uuid:2dc1a4c6-eb46-11e0-869e-485b397edc67>
<tag:example.org,2011-10:graph1>

RDF datasets can be used for this detailed tracking of the state of part 
of the web by careful choice of the graph URIs.  In an RDF dataset the 
app can record the state of <http://www.server.net/resource> at 
different times using different URIs for different times when the app 
does a GET. This works nicely with the default graph including the 
manifest of the named graph - when they were read, from where, etc etc. 
  While this is my description, it's my understanding of what 3Store was 
doing, except tit used bNodes for the graph identification.

A common UC is wanting a copy of a remote graph, not worrying about the 
fact it might change (e.g. only the latest matters or it is to be 
considered unchanging).  Making the URI associated with the graph the 
place it comes from is easily comprehensible.  Application writers 
understand this viewpoint.

One setup of an RDF dataset is to make the default graph as the union of 
named graphs in the dataset is a common usage - some systems only offer 
this mode of operation for RDF datasets.

Yes - this use of the g-box URI for the graph URI is a shortcut.  But to 
argue for having to have the proper machinery that provides no perceived 
value to the app writer and adds to the cost/complexity is not an 
argument that is going to won very often.  The NGs of the "Named graph" 
paper didn't quite make it to the general "named graphs" for many people.


A different point of view when DAWG was debating multi-graph was the 
idea that the 4th field was a "context" for a triple.  All the triples 
were part of the same graph, but the triples were labelled to group 
them.  The app could ask "where did this triple come from?" or "which 
triples came from X?".  While this is a different way of thinking, 
coming from a particular and important class of applications, it is 
covered by the default=union usage of RDF datasets if context is a URI 
In n-Quads that's not required - it can be a literal or bnode

context ::= uriref | nodeID | literal

But the most important use case for SPARQL is querying one graph, no 
named graphs.  It's a progression from there - start with no GRAPH, add 
in other graph with URIs.  It's not a single complex concept that every 
app writer has to understand before using SPARQL at all.


I don't think collections of graph are the fundamental building block 
for the semantic web.  Graphs (and triples and URIs) are the the 
building block.

We can either rework the semweb stack to make collections of named graph 
fundamental, or introduce an intentionally secondary concept.

 Andy

Aside: even for graph literals, naming can be important.  Graph literals 
aren't small, so having to send the serialization around just to talk 
about it has practical implications.  And equality is akin to XMLLiterals.
Received on Friday, 30 September 2011 10:17:20 UTC