Re: Web Semantics for Datasets from Andy Seaborne on 2011-10-07 (public-rdf-wg@w3.org from October 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 07 Oct 2011 13:48:02 +0100
To: public-rdf-wg@w3.org
Message-ID: <4E8EF502.3030901@epimorphics.com>
 > Okay, that's enough for now.  Give me a +1 if you think this is headed
 > in a useful direction.

I like something like this as a pattern of good practice (well, 2 
patterns).  I don't agree with forcing the 4th column to have a specific 
meaning given all the other deployed uses we have now collected.


On one points:

I don't see why

<http://example.org>  { <s>  <p>  <o> . }

should mean it is ONLY that triple rather than CONTAINS that triple.  If 
the data publisher wants to say "and that's all" then they should say so 
as an additional fact.  The converse of "it's closed by default" is 
harder to see how to allow it to be open sometimes.

For a large graph, and you only need to talk about a small subset, the 
deployment issues.  Consider dbpedia.

(I also want to see the same change in TriG for concatenation of files)

 Andy

On 07/10/11 03:04, Sandro Hawke wrote:
> Here's a proposal for what the fourth column should mean.  It's kind of
> obvious, and I think it's how many of us just assumed Named Graphs were
> supposed to work.    But I don't think it's been written down in a form
> we can use, so here it is, in a first draft.
>
> I haven't really tried to motivate this, but one thing it does is allow
> folks to refer to a graphs using just one URI.  As [1] points out rather
> painfully, as things stand now, you need multiple URIs just to identify
> each g-box (and thus g-snap).  (That is, you need to say which sparql
> endpoint you're talking about, and then which graph within its
> dataset.)
>
> My starting question was: what is the relationship between the IRI (the
> "graph name") and its associated g-snap in an RDF Dataset.  This
> applies to the dataset backing any SPARQL end point, as well as the
> dataset serialized in any multigraph syntax, like TriG or N-Quads.
> Another way to look at it: what does it mean to assert a TriG
> document?  If you send me the TriG Document "<a>  {<s>  <p>  <o>  }", and
> I trust you, what do I now know?
>
> Richard, I think, has been arguing for a minimalist position,
> answering "nothing", or "it depends on out-of-band agreements".  This
> "Web Semantics" proposal is an alternative.
>
> === Proposal
>
> The idea here is to make the relationship between the URI and the
> graph be the standard Web naming relationship, similar to what we all
> use for Web pages.  When you dereference the URI, you get the graph.
>
> This has the feature of being, to some extent, observable.  Just like
> triples are claims about some domain of discourse, quads become claims
> about idealized Web dereference behavior.
>
> Specifically: Consider a "graph naming" to be the association of a
> graph name N with a graph G.  For the graph naming to hold, every
> successful dereference of N yielding an RDF graph must yield G.  For a
> dataset D to hold, its default graph must hold (as normal in RDF) and
> every graph naming pair in D must hold.
>
> Example 1:  This dataset
>
>     <http://example.org>  {<s>  <p>  <o>. }
>
> means that if anyone is able to dereference "http://example.org"
> and obtain an RDF graph serialization, the serialized graph will
> consist of the single triple,<s>  <p>  <o>.  Failure to dereference
> does not make the graph naming untrue, but a successful dereference
> yielding a different graph does.
>
> Example 2:  This dataset can never be true:
>
>     <http://example.org>  {<s>  <p>  1. }
>     <HTTP://example.org>  {<s>  <p>  2. }
>
> ... since one cannot get different results dereferencing URIs that
> differ only in the case of the scheme component (as per RFC 3986).
>
> Example 3:  This dataset:
>
>    <tag:hawke.org,2010-10-06:eg1>  {<s>  <p>  <o>. }
>
> cannot be tested using Web protocols, since the "tag" URI scheme is
> (by design) not dereferenceable.  Whether it is true or false cannot
> be determined experimentally.
>
> ==== Temporal Context
>
> How can we say:
>
>     <http://example.org>  {<s>  <p>  <o>. }
>
> if we suspect that "http://example.org" might serve some other content
> tomorrow?
>
> The answer is that datasets often need temporal qualification just
> like RDF graphs do.  It's just like saying in RDF:
>
>     <http://example.org/Alice>  foaf:age 25.
>
> One solution for foaf:age triples is to include triples like:
>     <>  dc:temporal "2011-10-06"^^xs:dateTime.
>
> and that can be done in datasets as well, using the default graph.
> More work is needed on this, but I'm pretty sure this proposal can use
> whatever solution people come up with for RDF and doesn't make matters
> much worse than they are already.
>
> ==== Practical Deployment Choices
>
> Any system which maintains a dataset (eg a sparql endpoint) or
> generates multigraph documents like TriG has to do one (or more) of
> the following:
>
> 1.  Use new non-dereferenceable graph names.  These could be tag or
>      uuid URIs, or http URIs in your own name space which you choose to
>      leave 404.
>
> 2.  Use your own dereferenceable graph names, perhaps relative to the
>      endpoint or TriG document URI.  If you do serve RDF content at
>      those URIs, it MUST be the same content (give or take stated time
>      lag).
>
> 3.  Use someone else's graph names.  Here, the key thing is temporal
>      metadata.  You have to decide what you want (copy once vs
>      synchronize with what accuracy) and (somehow) share that temporal
>      metadata.
>
>
> ...
>
> Okay, that's enough for now.  Give me a +1 if you think this is headed
> in a useful direction.
>
>      -- Sandro
>
> [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
>
>
Received on Friday, 7 October 2011 12:48:34 UTC