Re: Web Semantics for Datasets from Eric Prud'hommeaux on 2011-10-08 (public-rdf-wg@w3.org from October 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sat, 8 Oct 2011 18:02:16 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <20111008220214.GA12800@w3.org>
* Andy Seaborne <andy.seaborne@epimorphics.com> [2011-10-08 17:21+0100]
> 
> 
> On 07/10/11 16:04, Eric Prud'hommeaux wrote:
> >* Sandro Hawke<sandro@w3.org>  [2011-10-07 10:35-0400]
> >>On Fri, 2011-10-07 at 13:48 +0100, Andy Seaborne wrote:
> >>>>Okay, that's enough for now.  Give me a +1 if you think this is headed
> >>>  >  in a useful direction.
> >>>
> >>>I like something like this as a pattern of good practice (well, 2
> >>>patterns).  I don't agree with forcing the 4th column to have a specific
> >>>meaning given all the other deployed uses we have now collected.
> >>
> >>Yeah....   There is a middle ground where some datasets use Web
> >>semantics and some don't.  I see your point that we can't just force
> >>people to change -- we can't say the thingsthey've been saying now means
> >>something else.
> >>
> >>Maybe we can have a way to flag which datasets are using Web semantics,
> >>and allow market pressures to work?    Like, where we do a new mime type
> >>for a multigraph syntax, we could add this.   And maybe it's something
> >>we can flag in SPARQL service description.
> >>
> >>>On one points:
> >>>
> >>>I don't see why
> >>>
> >>><http://example.org>   {<s>   <p>   <o>  . }
> >>>
> >>>should mean it is ONLY that triple rather than CONTAINS that triple.  If
> >>>the data publisher wants to say "and that's all" then they should say so
> >>>as an additional fact.  The converse of "it's closed by default" is
> >>>harder to see how to allow it to be open sometimes.
> >>>
> >>>For a large graph, and you only need to talk about a small subset, the
> >>>deployment issues.  Consider dbpedia.
> >>>
> >>>(I also want to see the same change in TriG for concatenation of files)
> >>
> >>It seems to me that it's easy to go from complete to incomplete, just
> >>using a subgraph predicate.   Let's say we want to say G1 is the graph
> >>with only<s>  <p>  <o>  and G2 is a graph with that triple and maybe other
> >>stuff.   I'd say:
> >>
> >>     G1 {<s>  <p>  <o>. }
> >>     { G1 r:subgraphOf G2. }
> >>
> >>But I don't see how to communicate G1 the way you're talking about. How
> >>do you say "and that's all"?
> >
> >Imagining Trig used for both update and patch, I see it as specified
> >by the protocol. CONSTRUCT ?g { ?s ?p ?o } would give me the results
> >of a query substituted into a named graph pattern. A reply to a GET
> >would give me a complete resource ("and that's all"). A diff propa-
> >gation would could look like:
> >   -<G1>  { _:s1<p>  <o0>  }
> >   +<G1>  { _:s1<p>  <o1>  }
> >which means there were already some<G1>  triples and we've only
> >changed one of them. The use you want to define is, I believe,
> >characterized by GET<G1>, but I think the mapping of graph
> >names to sets of triples is useful in other places with other
> >presumptions of completeness.
> 
> SPARQL Update allows various ways of treating a change:
> 
> 
> 
> # if you want "replace", clear the destination first:
> CLEAR <G1> ;
> INSERT DATA { GRAPH <G1> { <s> <p> <o> } }
> 
> or a change:
> DELETE DATA { GRAPH <G1> { <s> <p> <o0> } }
> INSERT DATA { GRAPH <G1> { <s> <p> <o1> } }

Yeah, this could be thought of as a transfer of two datasets and I have a hunch we'll see more clever uses if we don't demand that a "dataset" is a total state but a bag to be exploited by protocols. I don't have more than a hunch there, but I guess there's some history to say that over-specifying leads to re-negotiation.


> 	Andy
> 
> >
> >
> >>     -- Sandro
> >>
> >>
> >>>	Andy
> >>>
> >>>On 07/10/11 03:04, Sandro Hawke wrote:
> >>>>Here's a proposal for what the fourth column should mean.  It's kind of
> >>>>obvious, and I think it's how many of us just assumed Named Graphs were
> >>>>supposed to work.    But I don't think it's been written down in a form
> >>>>we can use, so here it is, in a first draft.
> >>>>
> >>>>I haven't really tried to motivate this, but one thing it does is allow
> >>>>folks to refer to a graphs using just one URI.  As [1] points out rather
> >>>>painfully, as things stand now, you need multiple URIs just to identify
> >>>>each g-box (and thus g-snap).  (That is, you need to say which sparql
> >>>>endpoint you're talking about, and then which graph within its
> >>>>dataset.)
> >>>>
> >>>>My starting question was: what is the relationship between the IRI (the
> >>>>"graph name") and its associated g-snap in an RDF Dataset.  This
> >>>>applies to the dataset backing any SPARQL end point, as well as the
> >>>>dataset serialized in any multigraph syntax, like TriG or N-Quads.
> >>>>Another way to look at it: what does it mean to assert a TriG
> >>>>document?  If you send me the TriG Document "<a>   {<s>   <p>   <o>   }", and
> >>>>I trust you, what do I now know?
> >>>>
> >>>>Richard, I think, has been arguing for a minimalist position,
> >>>>answering "nothing", or "it depends on out-of-band agreements".  This
> >>>>"Web Semantics" proposal is an alternative.
> >>>>
> >>>>=== Proposal
> >>>>
> >>>>The idea here is to make the relationship between the URI and the
> >>>>graph be the standard Web naming relationship, similar to what we all
> >>>>use for Web pages.  When you dereference the URI, you get the graph.
> >>>>
> >>>>This has the feature of being, to some extent, observable.  Just like
> >>>>triples are claims about some domain of discourse, quads become claims
> >>>>about idealized Web dereference behavior.
> >>>>
> >>>>Specifically: Consider a "graph naming" to be the association of a
> >>>>graph name N with a graph G.  For the graph naming to hold, every
> >>>>successful dereference of N yielding an RDF graph must yield G.  For a
> >>>>dataset D to hold, its default graph must hold (as normal in RDF) and
> >>>>every graph naming pair in D must hold.
> >>>>
> >>>>Example 1:  This dataset
> >>>>
> >>>>     <http://example.org>   {<s>   <p>   <o>. }
> >>>>
> >>>>means that if anyone is able to dereference "http://example.org"
> >>>>and obtain an RDF graph serialization, the serialized graph will
> >>>>consist of the single triple,<s>   <p>   <o>.  Failure to dereference
> >>>>does not make the graph naming untrue, but a successful dereference
> >>>>yielding a different graph does.
> >>>>
> >>>>Example 2:  This dataset can never be true:
> >>>>
> >>>>     <http://example.org>   {<s>   <p>   1. }
> >>>>     <HTTP://example.org>   {<s>   <p>   2. }
> >>>>
> >>>>... since one cannot get different results dereferencing URIs that
> >>>>differ only in the case of the scheme component (as per RFC 3986).
> >>>>
> >>>>Example 3:  This dataset:
> >>>>
> >>>>    <tag:hawke.org,2010-10-06:eg1>   {<s>   <p>   <o>. }
> >>>>
> >>>>cannot be tested using Web protocols, since the "tag" URI scheme is
> >>>>(by design) not dereferenceable.  Whether it is true or false cannot
> >>>>be determined experimentally.
> >>>>
> >>>>==== Temporal Context
> >>>>
> >>>>How can we say:
> >>>>
> >>>>     <http://example.org>   {<s>   <p>   <o>. }
> >>>>
> >>>>if we suspect that "http://example.org" might serve some other content
> >>>>tomorrow?
> >>>>
> >>>>The answer is that datasets often need temporal qualification just
> >>>>like RDF graphs do.  It's just like saying in RDF:
> >>>>
> >>>>     <http://example.org/Alice>   foaf:age 25.
> >>>>
> >>>>One solution for foaf:age triples is to include triples like:
> >>>>     <>   dc:temporal "2011-10-06"^^xs:dateTime.
> >>>>
> >>>>and that can be done in datasets as well, using the default graph.
> >>>>More work is needed on this, but I'm pretty sure this proposal can use
> >>>>whatever solution people come up with for RDF and doesn't make matters
> >>>>much worse than they are already.
> >>>>
> >>>>==== Practical Deployment Choices
> >>>>
> >>>>Any system which maintains a dataset (eg a sparql endpoint) or
> >>>>generates multigraph documents like TriG has to do one (or more) of
> >>>>the following:
> >>>>
> >>>>1.  Use new non-dereferenceable graph names.  These could be tag or
> >>>>      uuid URIs, or http URIs in your own name space which you choose to
> >>>>      leave 404.
> >>>>
> >>>>2.  Use your own dereferenceable graph names, perhaps relative to the
> >>>>      endpoint or TriG document URI.  If you do serve RDF content at
> >>>>      those URIs, it MUST be the same content (give or take stated time
> >>>>      lag).
> >>>>
> >>>>3.  Use someone else's graph names.  Here, the key thing is temporal
> >>>>      metadata.  You have to decide what you want (copy once vs
> >>>>      synchronize with what accuracy) and (somehow) share that temporal
> >>>>      metadata.
> >>>>
> >>>>
> >>>>...
> >>>>
> >>>>Okay, that's enough for now.  Give me a +1 if you think this is headed
> >>>>in a useful direction.
> >>>>
> >>>>      -- Sandro
> >>>>
> >>>>[1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >>
> >
> 

-- 
-ericP
Received on Saturday, 8 October 2011 22:02:58 UTC