Re: Re: Web Semantics for Datasets

* Sandro Hawke <sandro@w3.org> [2011-10-07 10:35-0400]
> On Fri, 2011-10-07 at 13:48 +0100, Andy Seaborne wrote:
> > > Okay, that's enough for now.  Give me a +1 if you think this is headed
> >  > in a useful direction.
> > 
> > I like something like this as a pattern of good practice (well, 2 
> > patterns).  I don't agree with forcing the 4th column to have a specific 
> > meaning given all the other deployed uses we have now collected.
> 
> Yeah....   There is a middle ground where some datasets use Web
> semantics and some don't.  I see your point that we can't just force
> people to change -- we can't say the thingsthey've been saying now means
> something else.
> 
> Maybe we can have a way to flag which datasets are using Web semantics,
> and allow market pressures to work?    Like, where we do a new mime type
> for a multigraph syntax, we could add this.   And maybe it's something
> we can flag in SPARQL service description.
> 
> > On one points:
> > 
> > I don't see why
> > 
> > <http://example.org>  { <s>  <p>  <o> . }
> > 
> > should mean it is ONLY that triple rather than CONTAINS that triple.  If 
> > the data publisher wants to say "and that's all" then they should say so 
> > as an additional fact.  The converse of "it's closed by default" is 
> > harder to see how to allow it to be open sometimes.
> > 
> > For a large graph, and you only need to talk about a small subset, the 
> > deployment issues.  Consider dbpedia.
> > 
> > (I also want to see the same change in TriG for concatenation of files)
> 
> It seems to me that it's easy to go from complete to incomplete, just
> using a subgraph predicate.   Let's say we want to say G1 is the graph
> with only <s> <p> <o> and G2 is a graph with that triple and maybe other
> stuff.   I'd say:
> 
>     G1 { <s> <p> <o>. }
>     { G1 r:subgraphOf G2. }      
> 
> But I don't see how to communicate G1 the way you're talking about. How
> do you say "and that's all"?

Imagining Trig used for both update and patch, I see it as specified
by the protocol. CONSTRUCT ?g { ?s ?p ?o } would give me the results
of a query substituted into a named graph pattern. A reply to a GET
would give me a complete resource ("and that's all"). A diff propa-
gation would could look like:
  - <G1> { _:s1 <p> <o0> }
  + <G1> { _:s1 <p> <o1> }
which means there were already some <G1> triples and we've only
changed one of them. The use you want to define is, I believe,
characterized by GET <G1>, but I think the mapping of graph
names to sets of triples is useful in other places with other
presumptions of completeness.


>     -- Sandro
> 
> 
> > 	Andy
> > 
> > On 07/10/11 03:04, Sandro Hawke wrote:
> > > Here's a proposal for what the fourth column should mean.  It's kind of
> > > obvious, and I think it's how many of us just assumed Named Graphs were
> > > supposed to work.    But I don't think it's been written down in a form
> > > we can use, so here it is, in a first draft.
> > >
> > > I haven't really tried to motivate this, but one thing it does is allow
> > > folks to refer to a graphs using just one URI.  As [1] points out rather
> > > painfully, as things stand now, you need multiple URIs just to identify
> > > each g-box (and thus g-snap).  (That is, you need to say which sparql
> > > endpoint you're talking about, and then which graph within its
> > > dataset.)
> > >
> > > My starting question was: what is the relationship between the IRI (the
> > > "graph name") and its associated g-snap in an RDF Dataset.  This
> > > applies to the dataset backing any SPARQL end point, as well as the
> > > dataset serialized in any multigraph syntax, like TriG or N-Quads.
> > > Another way to look at it: what does it mean to assert a TriG
> > > document?  If you send me the TriG Document "<a>  {<s>  <p>  <o>  }", and
> > > I trust you, what do I now know?
> > >
> > > Richard, I think, has been arguing for a minimalist position,
> > > answering "nothing", or "it depends on out-of-band agreements".  This
> > > "Web Semantics" proposal is an alternative.
> > >
> > > === Proposal
> > >
> > > The idea here is to make the relationship between the URI and the
> > > graph be the standard Web naming relationship, similar to what we all
> > > use for Web pages.  When you dereference the URI, you get the graph.
> > >
> > > This has the feature of being, to some extent, observable.  Just like
> > > triples are claims about some domain of discourse, quads become claims
> > > about idealized Web dereference behavior.
> > >
> > > Specifically: Consider a "graph naming" to be the association of a
> > > graph name N with a graph G.  For the graph naming to hold, every
> > > successful dereference of N yielding an RDF graph must yield G.  For a
> > > dataset D to hold, its default graph must hold (as normal in RDF) and
> > > every graph naming pair in D must hold.
> > >
> > > Example 1:  This dataset
> > >
> > >     <http://example.org>  {<s>  <p>  <o>. }
> > >
> > > means that if anyone is able to dereference "http://example.org"
> > > and obtain an RDF graph serialization, the serialized graph will
> > > consist of the single triple,<s>  <p>  <o>.  Failure to dereference
> > > does not make the graph naming untrue, but a successful dereference
> > > yielding a different graph does.
> > >
> > > Example 2:  This dataset can never be true:
> > >
> > >     <http://example.org>  {<s>  <p>  1. }
> > >     <HTTP://example.org>  {<s>  <p>  2. }
> > >
> > > ... since one cannot get different results dereferencing URIs that
> > > differ only in the case of the scheme component (as per RFC 3986).
> > >
> > > Example 3:  This dataset:
> > >
> > >    <tag:hawke.org,2010-10-06:eg1>  {<s>  <p>  <o>. }
> > >
> > > cannot be tested using Web protocols, since the "tag" URI scheme is
> > > (by design) not dereferenceable.  Whether it is true or false cannot
> > > be determined experimentally.
> > >
> > > ==== Temporal Context
> > >
> > > How can we say:
> > >
> > >     <http://example.org>  {<s>  <p>  <o>. }
> > >
> > > if we suspect that "http://example.org" might serve some other content
> > > tomorrow?
> > >
> > > The answer is that datasets often need temporal qualification just
> > > like RDF graphs do.  It's just like saying in RDF:
> > >
> > >     <http://example.org/Alice>  foaf:age 25.
> > >
> > > One solution for foaf:age triples is to include triples like:
> > >     <>  dc:temporal "2011-10-06"^^xs:dateTime.
> > >
> > > and that can be done in datasets as well, using the default graph.
> > > More work is needed on this, but I'm pretty sure this proposal can use
> > > whatever solution people come up with for RDF and doesn't make matters
> > > much worse than they are already.
> > >
> > > ==== Practical Deployment Choices
> > >
> > > Any system which maintains a dataset (eg a sparql endpoint) or
> > > generates multigraph documents like TriG has to do one (or more) of
> > > the following:
> > >
> > > 1.  Use new non-dereferenceable graph names.  These could be tag or
> > >      uuid URIs, or http URIs in your own name space which you choose to
> > >      leave 404.
> > >
> > > 2.  Use your own dereferenceable graph names, perhaps relative to the
> > >      endpoint or TriG document URI.  If you do serve RDF content at
> > >      those URIs, it MUST be the same content (give or take stated time
> > >      lag).
> > >
> > > 3.  Use someone else's graph names.  Here, the key thing is temporal
> > >      metadata.  You have to decide what you want (copy once vs
> > >      synchronize with what accuracy) and (somehow) share that temporal
> > >      metadata.
> > >
> > >
> > > ...
> > >
> > > Okay, that's enough for now.  Give me a +1 if you think this is headed
> > > in a useful direction.
> > >
> > >      -- Sandro
> > >
> > > [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
> > >
> > >
> > 
> > 
> 
> 
> 

-- 
-ericP

Received on Friday, 7 October 2011 15:05:19 UTC