Re: Web Semantics for Datasets from Steve Harris on 2011-10-07 (public-rdf-wg@w3.org from October 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 7 Oct 2011 11:02:45 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: public-rdf-wg <public-rdf-wg@w3.org>
Message-Id: <2B56B402-1FEC-4BFC-BBD4-25DC061CDEC9@garlik.com>
This direction has some appeal to me personally.

I don't immediately see how you could use it to represent the data you get from a single URI at different points of time, in the same Dataset, but that's something that could be worked out.

- Steve

On 2011-10-07, at 03:04, Sandro Hawke wrote:

> Here's a proposal for what the fourth column should mean.  It's kind of
> obvious, and I think it's how many of us just assumed Named Graphs were
> supposed to work.    But I don't think it's been written down in a form
> we can use, so here it is, in a first draft.
> 
> I haven't really tried to motivate this, but one thing it does is allow
> folks to refer to a graphs using just one URI.  As [1] points out rather
> painfully, as things stand now, you need multiple URIs just to identify
> each g-box (and thus g-snap).  (That is, you need to say which sparql
> endpoint you're talking about, and then which graph within its
> dataset.)
> 
> My starting question was: what is the relationship between the IRI (the
> "graph name") and its associated g-snap in an RDF Dataset.  This
> applies to the dataset backing any SPARQL end point, as well as the
> dataset serialized in any multigraph syntax, like TriG or N-Quads.
> Another way to look at it: what does it mean to assert a TriG
> document?  If you send me the TriG Document "<a> { <s> <p> <o> }", and
> I trust you, what do I now know?
> 
> Richard, I think, has been arguing for a minimalist position,
> answering "nothing", or "it depends on out-of-band agreements".  This
> "Web Semantics" proposal is an alternative.
> 
> === Proposal
> 
> The idea here is to make the relationship between the URI and the
> graph be the standard Web naming relationship, similar to what we all
> use for Web pages.  When you dereference the URI, you get the graph.
> 
> This has the feature of being, to some extent, observable.  Just like
> triples are claims about some domain of discourse, quads become claims
> about idealized Web dereference behavior.
> 
> Specifically: Consider a "graph naming" to be the association of a
> graph name N with a graph G.  For the graph naming to hold, every
> successful dereference of N yielding an RDF graph must yield G.  For a
> dataset D to hold, its default graph must hold (as normal in RDF) and
> every graph naming pair in D must hold.
> 
> Example 1:  This dataset
> 
>   <http://example.org> { <s> <p> <o>. }
> 
> means that if anyone is able to dereference "http://example.org"
> and obtain an RDF graph serialization, the serialized graph will
> consist of the single triple, <s> <p> <o>.  Failure to dereference
> does not make the graph naming untrue, but a successful dereference
> yielding a different graph does.
> 
> Example 2:  This dataset can never be true:
> 
>   <http://example.org> { <s> <p> 1. }
>   <HTTP://example.org> { <s> <p> 2. }
> 
> ... since one cannot get different results dereferencing URIs that
> differ only in the case of the scheme component (as per RFC 3986).
> 
> Example 3:  This dataset:
> 
>  <tag:hawke.org,2010-10-06:eg1> { <s> <p> <o>. }
> 
> cannot be tested using Web protocols, since the "tag" URI scheme is
> (by design) not dereferenceable.  Whether it is true or false cannot
> be determined experimentally.
> 
> ==== Temporal Context
> 
> How can we say:
> 
>   <http://example.org> { <s> <p> <o>. }
> 
> if we suspect that "http://example.org" might serve some other content
> tomorrow?
> 
> The answer is that datasets often need temporal qualification just
> like RDF graphs do.  It's just like saying in RDF:
> 
>   <http://example.org/Alice> foaf:age 25.
> 
> One solution for foaf:age triples is to include triples like:
>   <> dc:temporal "2011-10-06"^^xs:dateTime.
> 
> and that can be done in datasets as well, using the default graph.
> More work is needed on this, but I'm pretty sure this proposal can use
> whatever solution people come up with for RDF and doesn't make matters
> much worse than they are already.
> 
> ==== Practical Deployment Choices
> 
> Any system which maintains a dataset (eg a sparql endpoint) or
> generates multigraph documents like TriG has to do one (or more) of
> the following:
> 
> 1.  Use new non-dereferenceable graph names.  These could be tag or
>    uuid URIs, or http URIs in your own name space which you choose to
>    leave 404.
> 
> 2.  Use your own dereferenceable graph names, perhaps relative to the
>    endpoint or TriG document URI.  If you do serve RDF content at
>    those URIs, it MUST be the same content (give or take stated time
>    lag).
> 
> 3.  Use someone else's graph names.  Here, the key thing is temporal
>    metadata.  You have to decide what you want (copy once vs
>    synchronize with what accuracy) and (somehow) share that temporal
>    metadata.
> 
> 
> ...
> 
> Okay, that's enough for now.  Give me a +1 if you think this is headed
> in a useful direction.
> 
>    -- Sandro
> 
> [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Friday, 7 October 2011 10:03:33 UTC