Re: The Entailment bit (was Re: thoughts from Tuesday telecon) from Seaborne, Andy on 2005-09-27 (public-rdf-dawg@w3.org from July to September 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 27 Sep 2005 10:28:29 +0100
To: Bijan Parsia <bparsia@isr.umd.edu>
CC: Jim Hendler <hendler@cs.umd.edu>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <433910BD.2030801@hp.com>
Bijan Parsia wrote:
> On Sep 26, 2005, at 8:41 AM, Seaborne, Andy wrote:
> 
> 
>>Bijan Parsia wrote:
>>
>>>On Sep 22, 2005, at 6:03 PM, Pat Hayes wrote:
>>
>>. . .
>>
>>>>2a. The question arose: if SPARQL can be used (perhaps in the 
>>>>future, but let us look ahead) with various notions of entailment, 
>>>>should a query be able, or be required, to specify the kind of 
>>>>entailment intended? Although this was discussed only briefly, there 
>>>>was a consensus that it would be acceptable for the entailment in 
>>>>use to be specified as part of the 'service', identified by the URI 
>>>>currently used to name the target graph.
>>>
>>>While I think that is a reasonable point in the design space, I talk 
>>>with Jim and he flat out rejected it. So there's still play here. (I 
>>>myself don't think it's the best design but do think it's workable.)
>>
>>[Bijan-
>>I'm assuming that Jim's issue is wanting to query the same graph under 
>>different semantics as discussed below.  If there other concerns, 
>>could you say what they are?
>>]
> 
> 
> As I understand it, the idea is that reasoning level is set on per 
> endpoint basis. Endpoints provide access to 1 or more number of graphs 
> (i.e., a dataset).

The protocol allows for multiple datasets per endpoint via the default and 
named graph URI parameters.  For a service that dynamically assembles 
datasets, this can mean reding them in but equally a service may go "yep - 
I've got that dataset" or "no - I don't - request rejected".

> Each endpoint has its own URI. If I understand the 
> proposal, this would necessitate a distinct endpoint for each semantics 
> offered by the service and the semantics must be(?) uniform over the 
> dataset. I don't think that's the most useful only configuration (e.g., 
> I might want the background graph to be with RDFS semantics containing 
> info about the possible semantics offered by component graphs; a 
> different issue would be being willing to drop down in the presence of 
> a timeout, 

[I can't tell here if you are pointing out that there are other configurations 
or actively arguing for the DAWG design to include these cases.]

In earlier WG discussions, DanC point out that a graph and its RDFS closure 
are not the same and would be expected to have different URIs as graphs to 
query.  Similarly, the same base graph can be put into a dataset with 
different entailment semantics under different URIs.  Given we have a 
mechanism in SPARQL [+] already, having a specialised one for just entailment 
specification when there may be all sorts of other charactistics that matter, 
seems confusing.  (Other charactistics might be data coverage, up-to-date-ness 
- nothing to do with entailment settings.)

This would allow the case you describe and put the onus of the description of 
the URI as to its entailment status.

As I see it, the contrast is between designs put some kind of parameter in the 
client request (query/protocol somewhere) and designs that make this part of 
the description of the service.  I don't see that one class of designs can 
express something the other can't - in a service-centric design [*], it seems 
more natural to do it as part of the descriptions of service and datasets.  As 
descriptions are going to be vocabulary-based they are more naturally extensible.

[+] And that "SPARQL" is both QL and P.

[*] which, in Boston, I didn't argue for, I admit :-)

 > e.g., I accept OWL-DL results, then RDFS but prefer OWL-DL
 > (similarly to connegging); if you timeout or are in danger of timing
 > out with OWL-DL, just send the RDFS).

The timeout recovery policy should not be part of the protocol design but 
instead it should be the client application decision as to what to do after 
receiving the timeout.  There are too many choices and variations to encode 
into a protocol specification (small "s") whereas communicating the 
information back to the client for an application-level decision makes the 
problem space tractable.  It might even be that the client can only make the 
decision after the request is sent (e.g. knowing the time to fulfil it's 
contract for, say, putting a web page up under load).

> 
> 
>>>>We are indebted to Enrico for making this point vividly clear with 
>>>>the 'little-house' (aka 'worker') example.
>>>>
>>>>(I would suggest, and this is purely my own personal view, that we 
>>>>can adopt a compromise here, in which SPARQL in its current release 
>>>>will refer to simple entailment; the issue is pointed out; the 
>>>>actual spec. refers to virtual graphs identified by URIs, and refers 
>>>>to the RDF and RDFS closure lemmas; and the possibility of using 
>>>>URIs to identify services which offer other kinds of entailment is 
>>>>pointed out as a future extension path.
>>>
>>>Hmm. Doesn't this bias things against Jim's desire for one and the 
>>>same URI identified graph to be queried under different semantics? In 
>>>other words, does this close discussion on that protocol design 
>>>decision before alternatives have been considered?
>>
>>This seems to be a different matter.
>>
>>The protocol paradigm is service-centric, not graph-centric (this was 
>>after some debate so I think we have considered it, may be not exactly 
>>as described).
> 
> 
> Yes, sorry. I mean that how I, a server, offer different semantics for 
> the graphs I query over.

One way would be with different URIs for the graphs under different semantics 
as above. You can even then mix-and-match semantic levels in a single query 
and know which variables resulted from which entailment level.  It seems to me 
to be least change - the alternative of, e.g., new keyowrds in the QL will 
result in a query being about as long syntactically, but now there is a 
non-uniform treatment of one query characteristic.

(I'm assuming here that we want to enable static query analysis to reject or 
accept a query so the entailment level is a URI, not a variable of the query 
itself).

	Andy

> 
>>It is the combination of graph (by parameter) + service + query that 
>>gives the results.  Querying the same graph under different semantics 
>>is asking a different service unless we change the service-centric 
>>emphasis of the protocol.
> 
> 
> Yes, so the only question I see at the moment is whether one wants to 
> force a distinct call (to a distinct endpoint) in the following 
> situations:
> 
> A server knows it can reasonably handle with OWL-DL Ontologies A and B, 
> but it can't deal with C and D except with RDFS (even though they 
> species validate as OWL-DL). I cannot query over these services with 
> the maximal semantics the server handles in one call. (I have to query 
> C and D on the rdfs endpoint and A and B on the OWL-DL endpoint).
> 
> Actually, all the other situations are variants of this. I don't know 
> if this is what jim has in mind.
> 
> 
>>A hybrid would be to have a protocol argument (or query clause) which 
>>is influencing the service.
> 
> 
> Yes. So I might like on a per query basis the pattern of semantics for 
> graphs offered by the service.
> 
> 
>> Some might say this is getting away from the service-centric protocol.
> 
> 
> Doesn't seem more so than the various graph selection operations.
> 
> 
>>I would not like to enumerate all the possible values of this argument 
>>(mentioning some well-known cases is OK) in rq23 or the protocol doc 
>>because of uses of subsets of OWL/RDFS entailment for tuned 
>>performance.  [This would also for rules].
> 
> 
> I would like distinguished designators for the current set of defined 
> entailment relations with extensbility (for new variants). There are 
> certainly a billion subsets and extensions of OWL alone which people 
> might want to indicate. However, I think having the gross distinctions 
> of: simple, rdf, rdfs, owl-lite, owl-dl, and owl-full is a useful base, 
> matches the existing specs, and maps to behavior of e.g., webont (for 
> their test cases) and software (species validators). (Swoop, for 
> example distinguishes species validation (values supported by the w3c) 
> and expressivity (wihich is more fine grained)).
> [snip]
> 
>>>There is a charter prohibition, but I would propose altering that as 
>>>it all comes out so nice.
>>>One question worth answering is whether there will be implementor 
>>>support at this time. I believe I can pledge that Pellet will support 
>>>SPARQL over OWL DL. Indeed, if Jena's SPARQL implementation separates 
>>>the graph matching and the rest of the algebra, I believe it's a tiny 
>>>hookup for us.
>>
>>Indeed it does.
>>
>>ARQ works over graphs so you can have your own graph implementation 
>>but here it would be better to override the ARQ implementation of 
>>basic pattern matching to add your own.  This has been done for 
>>writing queries to legacy SQL databases so has been tried out.
> 
> 
> Great!
> 
> Cheers,
> Bijan.
>
Received on Tuesday, 27 September 2005 09:31:31 UTC