Re: Web Semantics for Datasets from Sandro Hawke on 2011-10-10 (public-rdf-wg@w3.org from October 2011)

From: Sandro Hawke <sandro@w3.org>
Date: Sun, 09 Oct 2011 22:23:02 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <1318213382.2111.26.camel@waldron>
On Sat, 2011-10-08 at 17:31 +0100, Andy Seaborne wrote:
> 
> On 07/10/11 15:35, Sandro Hawke wrote:
> > On Fri, 2011-10-07 at 13:48 +0100, Andy Seaborne wrote:
> >>> Okay, that's enough for now.  Give me a +1 if you think this is headed
> >>   >  in a useful direction.
> >>
> >> I like something like this as a pattern of good practice (well, 2
> >> patterns).  I don't agree with forcing the 4th column to have a specific
> >> meaning given all the other deployed uses we have now collected.
> >
> > Yeah....   There is a middle ground where some datasets use Web
> > semantics and some don't.  I see your point that we can't just force
> > people to change -- we can't say the thingsthey've been saying now means
> > something else.
> >
> > Maybe we can have a way to flag which datasets are using Web semantics,
> > and allow market pressures to work?    Like, where we do a new mime type
> > for a multigraph syntax, we could add this.   And maybe it's something
> > we can flag in SPARQL service description.
> >
> >> On one points:
> >>
> >> I don't see why
> >>
> >> <http://example.org>   {<s>   <p>   <o>  . }
> >>
> >> should mean it is ONLY that triple rather than CONTAINS that triple.  If
> >> the data publisher wants to say "and that's all" then they should say so
> >> as an additional fact.  The converse of "it's closed by default" is
> >> harder to see how to allow it to be open sometimes.
> >>
> >> For a large graph, and you only need to talk about a small subset, the
> >> deployment issues.  Consider dbpedia.
> >>
> >> (I also want to see the same change in TriG for concatenation of files)
> >
> > It seems to me that it's easy to go from complete to incomplete, just
> > using a subgraph predicate.   Let's say we want to say G1 is the graph
> > with only<s>  <p>  <o>  and G2 is a graph with that triple and maybe other
> > stuff.   I'd say:
> >
> >      G1 {<s>  <p>  <o>. }
> >      { G1 r:subgraphOf G2. }
> >
> > But I don't see how to communicate G1 the way you're talking about. How
> > do you say "and that's all"?
> 
> 
> 
>        G1 { <s>  <p>  <o>. }
>        { G1 r:representationOf G2. }

I don't understand.  Can you expand those out in English?    I read
that, with your proposed subgraph semantics as:

    G1 is a graph which contains at least the triple <s> <p> <o>.

    G1 is a representation of G2.

I don't know what "representation" means in this sense, but in any case,
how can we know from those statements that G2 contains only that one
triple?   The only connection between that triple and G2 is via G1, and
we've made that connection so loose that it can't serve this purpose.

> Mindful of DanBri comments dereference being non-global

Which I disposed of.  The cases of dereference being non-global are not
useful to this purpose, so we're not using them (in my proposal).
(Note that non-global dereference, while useful for some things, is
anathema to much of the Web.   Search engines can't really handle it,
etc.)

>  maybe all that 
> can ever be said is "subgraph" so making the
> 
>      G1 {<s>  <p>  <o>. }
> 
> case the subgraph case may be where AWWW leads us. 

I'm quote confident it doesn't.

>  Stronger statements 
> need additional triples to make them and this reflect that fact that 
> additional knowledge over and above AWWW deref is being used.

No.  When I load a Web over the web, it's very clear to me, at a
protocol level, when I've gotten the full
"representation" (serialization) of the page, of an image, of a video,
of a stylesheet, etc.

Also, when I download an RDF/XML or Turtle file, I can tell when I'm
done.   We want to be able to support merging of graphs but that doesn't
mean have to pretend the boundaries between the graphs, pre-merging,
don't exist.    And the whole utility of "named graphs" for some folks
(eg Tim Lebo) is that it lets you draw boundaries around graphs and
point to them.

    -- Sandro


>  Andy
> 
> 
> >
> >      -- Sandro
> >
> >
> >>  Andy
> >>
> >> On 07/10/11 03:04, Sandro Hawke wrote:
> >>> Here's a proposal for what the fourth column should mean.  It's kind of
> >>> obvious, and I think it's how many of us just assumed Named Graphs were
> >>> supposed to work.    But I don't think it's been written down in a form
> >>> we can use, so here it is, in a first draft.
> >>>
> >>> I haven't really tried to motivate this, but one thing it does is allow
> >>> folks to refer to a graphs using just one URI.  As [1] points out rather
> >>> painfully, as things stand now, you need multiple URIs just to identify
> >>> each g-box (and thus g-snap).  (That is, you need to say which sparql
> >>> endpoint you're talking about, and then which graph within its
> >>> dataset.)
> >>>
> >>> My starting question was: what is the relationship between the IRI (the
> >>> "graph name") and its associated g-snap in an RDF Dataset.  This
> >>> applies to the dataset backing any SPARQL end point, as well as the
> >>> dataset serialized in any multigraph syntax, like TriG or N-Quads.
> >>> Another way to look at it: what does it mean to assert a TriG
> >>> document?  If you send me the TriG Document "<a>   {<s>   <p>   <o>   }", and
> >>> I trust you, what do I now know?
> >>>
> >>> Richard, I think, has been arguing for a minimalist position,
> >>> answering "nothing", or "it depends on out-of-band agreements".  This
> >>> "Web Semantics" proposal is an alternative.
> >>>
> >>> === Proposal
> >>>
> >>> The idea here is to make the relationship between the URI and the
> >>> graph be the standard Web naming relationship, similar to what we all
> >>> use for Web pages.  When you dereference the URI, you get the graph.
> >>>
> >>> This has the feature of being, to some extent, observable.  Just like
> >>> triples are claims about some domain of discourse, quads become claims
> >>> about idealized Web dereference behavior.
> >>>
> >>> Specifically: Consider a "graph naming" to be the association of a
> >>> graph name N with a graph G.  For the graph naming to hold, every
> >>> successful dereference of N yielding an RDF graph must yield G.  For a
> >>> dataset D to hold, its default graph must hold (as normal in RDF) and
> >>> every graph naming pair in D must hold.
> >>>
> >>> Example 1:  This dataset
> >>>
> >>>      <http://example.org>   {<s>   <p>   <o>. }
> >>>
> >>> means that if anyone is able to dereference "http://example.org"
> >>> and obtain an RDF graph serialization, the serialized graph will
> >>> consist of the single triple,<s>   <p>   <o>.  Failure to dereference
> >>> does not make the graph naming untrue, but a successful dereference
> >>> yielding a different graph does.
> >>>
> >>> Example 2:  This dataset can never be true:
> >>>
> >>>      <http://example.org>   {<s>   <p>   1. }
> >>>      <HTTP://example.org>   {<s>   <p>   2. }
> >>>
> >>> ... since one cannot get different results dereferencing URIs that
> >>> differ only in the case of the scheme component (as per RFC 3986).
> >>>
> >>> Example 3:  This dataset:
> >>>
> >>>     <tag:hawke.org,2010-10-06:eg1>   {<s>   <p>   <o>. }
> >>>
> >>> cannot be tested using Web protocols, since the "tag" URI scheme is
> >>> (by design) not dereferenceable.  Whether it is true or false cannot
> >>> be determined experimentally.
> >>>
> >>> ==== Temporal Context
> >>>
> >>> How can we say:
> >>>
> >>>      <http://example.org>   {<s>   <p>   <o>. }
> >>>
> >>> if we suspect that "http://example.org" might serve some other content
> >>> tomorrow?
> >>>
> >>> The answer is that datasets often need temporal qualification just
> >>> like RDF graphs do.  It's just like saying in RDF:
> >>>
> >>>      <http://example.org/Alice>   foaf:age 25.
> >>>
> >>> One solution for foaf:age triples is to include triples like:
> >>>      <>   dc:temporal "2011-10-06"^^xs:dateTime.
> >>>
> >>> and that can be done in datasets as well, using the default graph.
> >>> More work is needed on this, but I'm pretty sure this proposal can use
> >>> whatever solution people come up with for RDF and doesn't make matters
> >>> much worse than they are already.
> >>>
> >>> ==== Practical Deployment Choices
> >>>
> >>> Any system which maintains a dataset (eg a sparql endpoint) or
> >>> generates multigraph documents like TriG has to do one (or more) of
> >>> the following:
> >>>
> >>> 1.  Use new non-dereferenceable graph names.  These could be tag or
> >>>       uuid URIs, or http URIs in your own name space which you choose to
> >>>       leave 404.
> >>>
> >>> 2.  Use your own dereferenceable graph names, perhaps relative to the
> >>>       endpoint or TriG document URI.  If you do serve RDF content at
> >>>       those URIs, it MUST be the same content (give or take stated time
> >>>       lag).
> >>>
> >>> 3.  Use someone else's graph names.  Here, the key thing is temporal
> >>>       metadata.  You have to decide what you want (copy once vs
> >>>       synchronize with what accuracy) and (somehow) share that temporal
> >>>       metadata.
> >>>
> >>>
> >>> ...
> >>>
> >>> Okay, that's enough for now.  Give me a +1 if you think this is headed
> >>> in a useful direction.
> >>>
> >>>       -- Sandro
> >>>
> >>> [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
> >>>
> >>>
> >>
> >>
> >
> >
>
Received on Monday, 10 October 2011 02:23:11 UTC