Re: Graph naming in Service Descriptions (ACTION-266; discussion of comment DS-1) from Sandro Hawke on 2010-07-13 (public-rdf-dawg@w3.org from July to September 2010)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 13 Jul 2010 09:45:49 -0400
To: Gregory Williams <greg@evilfunhouse.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <1279028749.2927.42.camel@waldron>
On Mon, 2010-07-12 at 22:35 -0400, Gregory Williams wrote:
> Damian Steer recently wrote to the comments list[1] about the naming of named graphs in the service description document. This is the same issue that Sandro brought up (I think at F2F2). Briefly, the current SD document suggests modeling datasets and named graphs like this:
> 
> [] a sd:Dataset ;
>  sd:namedGraph [
>   sd:name <http://www.example/named-graph> ;
>   sd:graph [
>    # graph description goes here
>   ]
>  ]
> 
> Damian suggests that problems arise if
> 
> <http://www.example/named-graph> owl:sameAs <http://www.example/another-named-graph>
> 
> This would suggest the service description entails
> 
> ... sd:name <http://www.example/another-named-graph>
> 
> but the endpoint probably won't let you substitute the one graph name for the other in a query and still return the same results. 

Like some other people replying here, I'm not actually worried about
this particular problem.   If the graphs are really the same, then the
service descriptions of them will logically be the same, and that's
okay.  The fact that the end point doesn't work right with one of the
URIs is nothing new in the Linked Data world -- even though two URIs
might be known to denote the same thing, that doesn't mean dereferencing
them will give you the same results and behavior.     

The obvious example would be:
        http://www.w3.org/People/Berners-Lee/card#i
        http://dbpedia.org/resource/Tim_Berners-Lee
        
Those are absolutely and perfectly owl:sameAs each other, but if you put
them into your browser, you'll get different results.  That's by design,
and as it should be.   It can be a challenge to deal with, but it's an
appropriate challenge, it's a degree of freedom we'll need to make the
system actually work.

> Sandro had suggested that the graph name should really be described using an xsd:anyURI-typed literal.

My immediate motivation for this was the use of the property sd:name.
It's like fingernails on a blackboard to me, saying that a named graph
has a "name" which is the graph itself.  It's exactly like me
introducing you to my friend Matt, and telling you that his name is
actually himself, Matt.  No, not the text "Matt" -- his name is his
*self*.  His name doesn't start with an M, his name is actually six feet
tall and drives a Honda.   This is clearly a very odd, counter-intuitive
(or even nonsensical) notion of "name".

> Andy had a nice use/test case[2] for graph names in SDs that I think is compelling, but breaks (or become much harder) if names are literals and not resources. Basically Andy showed several queries over a dataset comprised of both the service descriptions and the named graphs which the SD describes.

I find Andy's use case compelling as well, so I withdraw my suggestion
to use a literal.   Instead I suggest we at least change the term
"name", and possibly other aspects of the modeling.

> My gut feeling about this is that the owl:sameAs problem seems only to be a problem if you are actually working with OWL entailment. If the endpoint claims to support OWL entailment *and* asserts the owl:sameAs relationship between the graph names, then I would expect Damian's logic to be true and for there not to be any problem. On the other hand, if the endpoint doesn't use OWL entailment, or if it doesn't assert the graph name equivalence, I wouldn't expect to be able to use the graph names interchangeably. Is this a reasonable distinction to make? I'm not sure how we'd expect an endpoint to know about and reflect an external (and possibly erroneous) owl:sameAs statement, but maybe I've missed something?
> 
> I'm interested in hearing opinions on this issue (particularly from Sandro and Birte) as its come up twice and makes me nervous about getting the modeling right.

When you and I first talked about, I suggested you switch to this
modeling instead:

    # "Direct-style" nameing in service description
        [] a sd:Dataset ;
           sd:namedGraph <http://www.example/named-graph>.
        
        <http://www.example/named-graph> ... # whatever about it
        
This still seems reasonable to me.   At the time, you pointed out a
problem, namely that the same graph might be available via two different
end points, with different entailment regimes.  Specifically, g1 at
endpoint e1 might not do any inference, and g1 at endpoint e2 might do
OWL 2 DL inference.  In that situation, a client might download both
service descriptions and query over them to find the endpoint which does
what it wants.   With this "direct-style" naming, they'd get confused,
because the service descriptions would merge.   I agreed this was a
problem and was left with suggesting using a literal name.

After some more thought, though, I think the problem is with that
modeling of reasoning endpoints, and that direct-style naming is fine.

I suggest that, when things are working properly, and modulo propagation
issues, every endpoint ought to provide exactly the same triples for any
given named graph G1.   I think that's our only hope for sanity in the
SPARQL world.

If you want to provide an end point that offers G1 merged with its
entailments under some entialment regime E, then call it G2.   And in
the service description, say G2 sd:derivedFrom G1.   You should
definitely offer that sd:derivedFrom triple in the service description
for G2, and it would be nice if you could put it in the service
description for G1 as well, (along with a few more details, like *how*
it is derived, linking G2 to E).

Doesn't that seem a lot cleaner?   Does it have any problems?

To rephrase it slightly: if you offer two different views, derived from
the same input, give them different names.  Clients should use the name
of the view they want, not the name of the input it was derived from.

(oops, I spent too long on this message, and will be a few minutes late
to the telecon now, sorry.)

      -- Sandro


> thanks,
> .greg
> 
> 
> [1] http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2010Jun/0011.html
> [2] http://lists.w3.org/Archives/Public/public-rdf-dawg/2010AprJun/0088.html
> 
> 
>
Received on Tuesday, 13 July 2010 13:45:58 UTC