W3C home > Mailing lists > Public > public-rdf-wg@w3.org > October 2011

Re: Web Semantics for Datasets

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Sat, 08 Oct 2011 17:21:51 +0100
Message-ID: <4E90789F.5030502@epimorphics.com>
To: public-rdf-wg@w3.org


On 07/10/11 16:04, Eric Prud'hommeaux wrote:
> * Sandro Hawke<sandro@w3.org>  [2011-10-07 10:35-0400]
>> On Fri, 2011-10-07 at 13:48 +0100, Andy Seaborne wrote:
>>>> Okay, that's enough for now.  Give me a +1 if you think this is headed
>>>   >  in a useful direction.
>>>
>>> I like something like this as a pattern of good practice (well, 2
>>> patterns).  I don't agree with forcing the 4th column to have a specific
>>> meaning given all the other deployed uses we have now collected.
>>
>> Yeah....   There is a middle ground where some datasets use Web
>> semantics and some don't.  I see your point that we can't just force
>> people to change -- we can't say the thingsthey've been saying now means
>> something else.
>>
>> Maybe we can have a way to flag which datasets are using Web semantics,
>> and allow market pressures to work?    Like, where we do a new mime type
>> for a multigraph syntax, we could add this.   And maybe it's something
>> we can flag in SPARQL service description.
>>
>>> On one points:
>>>
>>> I don't see why
>>>
>>> <http://example.org>   {<s>   <p>   <o>  . }
>>>
>>> should mean it is ONLY that triple rather than CONTAINS that triple.  If
>>> the data publisher wants to say "and that's all" then they should say so
>>> as an additional fact.  The converse of "it's closed by default" is
>>> harder to see how to allow it to be open sometimes.
>>>
>>> For a large graph, and you only need to talk about a small subset, the
>>> deployment issues.  Consider dbpedia.
>>>
>>> (I also want to see the same change in TriG for concatenation of files)
>>
>> It seems to me that it's easy to go from complete to incomplete, just
>> using a subgraph predicate.   Let's say we want to say G1 is the graph
>> with only<s>  <p>  <o>  and G2 is a graph with that triple and maybe other
>> stuff.   I'd say:
>>
>>      G1 {<s>  <p>  <o>. }
>>      { G1 r:subgraphOf G2. }
>>
>> But I don't see how to communicate G1 the way you're talking about. How
>> do you say "and that's all"?
>
> Imagining Trig used for both update and patch, I see it as specified
> by the protocol. CONSTRUCT ?g { ?s ?p ?o } would give me the results
> of a query substituted into a named graph pattern. A reply to a GET
> would give me a complete resource ("and that's all"). A diff propa-
> gation would could look like:
>    -<G1>  { _:s1<p>  <o0>  }
>    +<G1>  { _:s1<p>  <o1>  }
> which means there were already some<G1>  triples and we've only
> changed one of them. The use you want to define is, I believe,
> characterized by GET<G1>, but I think the mapping of graph
> names to sets of triples is useful in other places with other
> presumptions of completeness.

SPARQL Update allows various ways of treating a change:



# if you want "replace", clear the destination first:
CLEAR <G1> ;
INSERT DATA { GRAPH <G1> { <s> <p> <o> } }

or a change:
DELETE DATA { GRAPH <G1> { <s> <p> <o0> } }
INSERT DATA { GRAPH <G1> { <s> <p> <o1> } }

	Andy

>
>
>>      -- Sandro
>>
>>
>>> 	Andy
>>>
>>> On 07/10/11 03:04, Sandro Hawke wrote:
>>>> Here's a proposal for what the fourth column should mean.  It's kind of
>>>> obvious, and I think it's how many of us just assumed Named Graphs were
>>>> supposed to work.    But I don't think it's been written down in a form
>>>> we can use, so here it is, in a first draft.
>>>>
>>>> I haven't really tried to motivate this, but one thing it does is allow
>>>> folks to refer to a graphs using just one URI.  As [1] points out rather
>>>> painfully, as things stand now, you need multiple URIs just to identify
>>>> each g-box (and thus g-snap).  (That is, you need to say which sparql
>>>> endpoint you're talking about, and then which graph within its
>>>> dataset.)
>>>>
>>>> My starting question was: what is the relationship between the IRI (the
>>>> "graph name") and its associated g-snap in an RDF Dataset.  This
>>>> applies to the dataset backing any SPARQL end point, as well as the
>>>> dataset serialized in any multigraph syntax, like TriG or N-Quads.
>>>> Another way to look at it: what does it mean to assert a TriG
>>>> document?  If you send me the TriG Document "<a>   {<s>   <p>   <o>   }", and
>>>> I trust you, what do I now know?
>>>>
>>>> Richard, I think, has been arguing for a minimalist position,
>>>> answering "nothing", or "it depends on out-of-band agreements".  This
>>>> "Web Semantics" proposal is an alternative.
>>>>
>>>> === Proposal
>>>>
>>>> The idea here is to make the relationship between the URI and the
>>>> graph be the standard Web naming relationship, similar to what we all
>>>> use for Web pages.  When you dereference the URI, you get the graph.
>>>>
>>>> This has the feature of being, to some extent, observable.  Just like
>>>> triples are claims about some domain of discourse, quads become claims
>>>> about idealized Web dereference behavior.
>>>>
>>>> Specifically: Consider a "graph naming" to be the association of a
>>>> graph name N with a graph G.  For the graph naming to hold, every
>>>> successful dereference of N yielding an RDF graph must yield G.  For a
>>>> dataset D to hold, its default graph must hold (as normal in RDF) and
>>>> every graph naming pair in D must hold.
>>>>
>>>> Example 1:  This dataset
>>>>
>>>>      <http://example.org>   {<s>   <p>   <o>. }
>>>>
>>>> means that if anyone is able to dereference "http://example.org"
>>>> and obtain an RDF graph serialization, the serialized graph will
>>>> consist of the single triple,<s>   <p>   <o>.  Failure to dereference
>>>> does not make the graph naming untrue, but a successful dereference
>>>> yielding a different graph does.
>>>>
>>>> Example 2:  This dataset can never be true:
>>>>
>>>>      <http://example.org>   {<s>   <p>   1. }
>>>>      <HTTP://example.org>   {<s>   <p>   2. }
>>>>
>>>> ... since one cannot get different results dereferencing URIs that
>>>> differ only in the case of the scheme component (as per RFC 3986).
>>>>
>>>> Example 3:  This dataset:
>>>>
>>>>     <tag:hawke.org,2010-10-06:eg1>   {<s>   <p>   <o>. }
>>>>
>>>> cannot be tested using Web protocols, since the "tag" URI scheme is
>>>> (by design) not dereferenceable.  Whether it is true or false cannot
>>>> be determined experimentally.
>>>>
>>>> ==== Temporal Context
>>>>
>>>> How can we say:
>>>>
>>>>      <http://example.org>   {<s>   <p>   <o>. }
>>>>
>>>> if we suspect that "http://example.org" might serve some other content
>>>> tomorrow?
>>>>
>>>> The answer is that datasets often need temporal qualification just
>>>> like RDF graphs do.  It's just like saying in RDF:
>>>>
>>>>      <http://example.org/Alice>   foaf:age 25.
>>>>
>>>> One solution for foaf:age triples is to include triples like:
>>>>      <>   dc:temporal "2011-10-06"^^xs:dateTime.
>>>>
>>>> and that can be done in datasets as well, using the default graph.
>>>> More work is needed on this, but I'm pretty sure this proposal can use
>>>> whatever solution people come up with for RDF and doesn't make matters
>>>> much worse than they are already.
>>>>
>>>> ==== Practical Deployment Choices
>>>>
>>>> Any system which maintains a dataset (eg a sparql endpoint) or
>>>> generates multigraph documents like TriG has to do one (or more) of
>>>> the following:
>>>>
>>>> 1.  Use new non-dereferenceable graph names.  These could be tag or
>>>>       uuid URIs, or http URIs in your own name space which you choose to
>>>>       leave 404.
>>>>
>>>> 2.  Use your own dereferenceable graph names, perhaps relative to the
>>>>       endpoint or TriG document URI.  If you do serve RDF content at
>>>>       those URIs, it MUST be the same content (give or take stated time
>>>>       lag).
>>>>
>>>> 3.  Use someone else's graph names.  Here, the key thing is temporal
>>>>       metadata.  You have to decide what you want (copy once vs
>>>>       synchronize with what accuracy) and (somehow) share that temporal
>>>>       metadata.
>>>>
>>>>
>>>> ...
>>>>
>>>> Okay, that's enough for now.  Give me a +1 if you think this is headed
>>>> in a useful direction.
>>>>
>>>>       -- Sandro
>>>>
>>>> [1] http://www.w3.org/2011/prov/wiki/Using_named_graphs_to_model_Accounts
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
Received on Saturday, 8 October 2011 16:22:34 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:45 GMT