Web Semantics for Datasets

Here's a proposal for what the fourth column should mean.  It's kind of
obvious, and I think it's how many of us just assumed Named Graphs were
supposed to work.    But I don't think it's been written down in a form
we can use, so here it is, in a first draft.

I haven't really tried to motivate this, but one thing it does is allow
folks to refer to a graphs using just one URI.  As [1] points out rather
painfully, as things stand now, you need multiple URIs just to identify
each g-box (and thus g-snap).  (That is, you need to say which sparql
endpoint you're talking about, and then which graph within its
dataset.)

My starting question was: what is the relationship between the IRI (the
"graph name") and its associated g-snap in an RDF Dataset.  This
applies to the dataset backing any SPARQL end point, as well as the
dataset serialized in any multigraph syntax, like TriG or N-Quads.
Another way to look at it: what does it mean to assert a TriG
document?  If you send me the TriG Document "<a> { <s> <p> <o> }", and
I trust you, what do I now know?

Richard, I think, has been arguing for a minimalist position,
answering "nothing", or "it depends on out-of-band agreements".  This
"Web Semantics" proposal is an alternative.

=== Proposal

The idea here is to make the relationship between the URI and the
graph be the standard Web naming relationship, similar to what we all
use for Web pages.  When you dereference the URI, you get the graph.

This has the feature of being, to some extent, observable.  Just like
triples are claims about some domain of discourse, quads become claims
about idealized Web dereference behavior.

Specifically: Consider a "graph naming" to be the association of a
graph name N with a graph G.  For the graph naming to hold, every
successful dereference of N yielding an RDF graph must yield G.  For a
dataset D to hold, its default graph must hold (as normal in RDF) and
every graph naming pair in D must hold.

Example 1:  This dataset

   <http://example.org> { <s> <p> <o>. }

means that if anyone is able to dereference "http://example.org"
and obtain an RDF graph serialization, the serialized graph will
consist of the single triple, <s> <p> <o>.  Failure to dereference
does not make the graph naming untrue, but a successful dereference
yielding a different graph does.

Example 2:  This dataset can never be true:

   <http://example.org> { <s> <p> 1. }
   <HTTP://example.org> { <s> <p> 2. }

... since one cannot get different results dereferencing URIs that
differ only in the case of the scheme component (as per RFC 3986).

Example 3:  This dataset:

  <tag:hawke.org,2010-10-06:eg1> { <s> <p> <o>. }

cannot be tested using Web protocols, since the "tag" URI scheme is
(by design) not dereferenceable.  Whether it is true or false cannot
be determined experimentally.

==== Temporal Context

How can we say:

   <http://example.org> { <s> <p> <o>. }

if we suspect that "http://example.org" might serve some other content
tomorrow?

The answer is that datasets often need temporal qualification just
like RDF graphs do.  It's just like saying in RDF:

   <http://example.org/Alice> foaf:age 25.

One solution for foaf:age triples is to include triples like:
   <> dc:temporal "2011-10-06"^^xs:dateTime.

and that can be done in datasets as well, using the default graph.
More work is needed on this, but I'm pretty sure this proposal can use
whatever solution people come up with for RDF and doesn't make matters
much worse than they are already.

==== Practical Deployment Choices

Any system which maintains a dataset (eg a sparql endpoint) or
generates multigraph documents like TriG has to do one (or more) of
the following:

1.  Use new non-dereferenceable graph names.  These could be tag or
    uuid URIs, or http URIs in your own name space which you choose to
    leave 404.

2.  Use your own dereferenceable graph names, perhaps relative to the
    endpoint or TriG document URI.  If you do serve RDF content at
    those URIs, it MUST be the same content (give or take stated time
    lag).

3.  Use someone else's graph names.  Here, the key thing is temporal
    metadata.  You have to decide what you want (copy once vs
    synchronize with what accuracy) and (somehow) share that temporal
    metadata.


...

Okay, that's enough for now.  Give me a +1 if you think this is headed
in a useful direction.

    -- Sandro

[1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts

Received on Friday, 7 October 2011 02:04:51 UTC