Re: Final text for Basic Graph Patterns from Pat Hayes on 2006-01-18 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 17 Jan 2006 20:44:02 -0600
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>, bparsia@isr.umd.edu
Message-Id: <p06230900bff34068fe29@[10.100.0.23]>
>A quick reaction:
>
>On 17 Jan 2006, at 17:42, Pat Hayes wrote:
>
>>Enrico Franconi wrote:
>>
>>>The new proposal of Pat 
>>><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0061.html> 
>>>does not work for any approach where bnodes are implicit, and this 
>>>happens not only for OWL-Lite entailment, as we already pointed 
>>>out in 
>>><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0064.html>, 
>>>but also for RDF entailment. For example, given the graph
>>>     :john :age "25"^^xsd:decimal .
>>>the query
>>>     ASK { _:b rdf:type rdf:XMLLiteral }
>>>should clearly return YES if RDF entailment is considered.
>>
>>I agree, and see below.
>>
>>>However,
>>>according to the latest proposal from Pat the answer to the query 
>>>would be NO regardless of the type of entailment considered. The 
>>>reason is that Pat considers bnodes in the query restricted to be 
>>>bound only to URIs and bnodes explicitly appearing in the graph.
>>
>>This restriction is to terms occurring in the scoping graph, not in 
>>the original graph. For the SPARQL case of simple entailment, the 
>>scoping graph is defined to be equivalent to the dataset graph, but 
>>for  RDF or RDFS entailment, the appropriate adjustment would be to 
>>define the scoping graph G' to be the RDF or RDFS closure of the 
>>data set graph G.
>
>Oh, no. We are not going through the graph closure discussion again.

No, no. Relax. The basic condition on an answer is still that the 
answer instance is appropriately entailed by the original graph. The 
discussion is only about how best to restrict the vocabulary of legal 
answer bindings: for this purpose, the finiteness of the closure is 
not important. I think this can be done even for OWL, simply by 
defining the relevant closure relative to the comprehension 
conditions described in the OWL/RDF semantics: it doesn't have to 
entail the answer, only provide a name for it.

But forget about closures: I agree with your quick reaction that 
appealing to closures is kind of tacky, so let me make an alternative 
suggestion, detailed below, which tries to separate out three 
different roles in the definition of answer and put constraints on 
them.

>The decision to have entailment was exactly to replace the closure 
>of the graph.
>Your proposal does not scale up: the closure of an RDF (or RDFS) 
>graph is infinite, and the closure of a OWL-DL graph is not unique 
>(OWL-disjunction).
>Moreover, if you let the scoping graph G' in the definition of basic 
>graph pattern matching, the algebraic expressions involving more 
>than one BGP wouldn't work properly (since you can not control 
>whether there are distinct scoping graphs Gi').

Of course we can control this: we simply impose it by fiat in the 
definitions. There must be a single scoping graph used for all 
answers to a query. We say this explicitly.

>Full stop. Unacceptable by FUB.

Please try to be more collegial. I believe I have answered your 
technical objections.

>As we already pointed out, if you really want a scoping graph G', 
>this should appear once forever outside the definition of basic 
>graph pattern matching, most likely at the beginning of any 
>processing of the original graph G by the SPARQL server.

Seems to me that you are here confusing definition scope with process 
scope. I agree that the definitions must be phrased so that all 
answers to a single query are defined with respect to a single 
scoping graph. (There is no need for a server to actually generate 
the scoping graph, of course: it plays a purely definitional role, so 
processing sequence questions seem irrelevant here (?)).

-----

Here is the suggestion, which sticks closely to your construction and 
wording, and so does not assimilate bnodes to variables. (I still 
think that alternative is better, on both semantic and pragmatic 
grounds, and do not accept your distinction between variables and 
"true existentials", but I will not press that particular point any 
further, in the interests of coming to an agreement. It makes no 
difference up to RDFS, other than keeping the definitions more 
elegant and clarifying the semantics.)

Comments to the WG written <<thus>>.

<<Same definitions of OrderedMerge, term, basic graph pattern, etc.>>

First we give a general form for the definition of "basic query 
answer" which applies to any entailment regime. SPARQL itself uses 
the simplest case of this.
<<It might make more sense in the document to do this LAST, in a 
final section which is clearly demarcated from the rest of the 
document, and can refer back to everything else, hence making the 
rather oblique reference to 'SPARQL syntax and protocols' less 
enigmatic. Also, we would then have explained the rationale for all 
the oddities in the definition.>>

Given an entailment regime E, a basic graph pattern, BGP, E-matches 
with pattern solution S on graph G with respect to a scoping graph G' 
and a scoping set B, just when the following three conditions all 
hold:

(1)  S(G' OrderedMerge BGP) is an appropriately well-formed RDF graph 
for E-entailment
(2)  G E-entails S(G' OrderedMerge BGP)
(3)  the identifiers introduced by S all occur in B.

Several conditions must be met by the scoping graph and scoping set. 
The scoping graph and scoping set must be identical for all answers 
to a query; the scoping graph G' must be graph-equivalent to G; and B 
must contain every term in G'.

Any querying language which uses the SPARQL syntax and protocols and 
satisfies all these conditions [may] be called a SPARQL extension, 
and be referred to by the use of an appropriate prefix, such as 
RDF-SPARQL. SPARQL extensions [may] impose further conditions on 
scoping sets, and [may] further restrict the set of answers, for 
example to avoid redundancy in answer sets; but any answer [must] 
correspond to a pattern solution which satisfies the three conditions 
above. Any such extra conditions [must] be stated so as to apply 
uniformly to all datasets.

<<This idea of 'extension' is intended to be in alignment with that 
of RDF semantic extension, which I guess is kind of obvious. Seems to 
me that this kind of "restriction-in-advance" style fits well with 
our charter requirement 1.6. >>

For example, it might be considered appropriate for an RDFS extension 
to SPARQL to require that all scoping sets contain all the URI 
references in the rdf: and rdfs: namespaces. This would allow 
'tautological' answers to queries against the empty dataset, and 
would correspond exactly to SPARQL queries posed against the RDFS 
closure of the dataset graph. Omitting such vocabulary from B would 
prohibit such answers, corresponding to a regime in which only 
identifiers in the dataset graph could be used in query answers.

<< We could actually define RDF- and RDFS-SPARQL ourselves, with B 
being all the terms in G' plus respectively the rdf: or the rdf: + 
rdfs: namespaces. And we could also define datatyped versions. I 
would however suggest, and not merely in a spirit of being 
obstructive, that we do not actually *define* OWL-DL-SPARQL. The only 
point we are leaving open, really, is exactly how to define the 
scoping vocabulary B for OWL. I remain concerned that this may have 
to be allowed to contain enough vocabulary to construct OWL syntax 
using RDF collections.>>

For SPARQL as defined in this document, the conditions are simplified 
so as to apply appropriately to simple entailment. Here the scoping 
set B is simply the set of terms in the scoping graph G', and the 
third condition need only be stated for bnodes.

A basic graph pattern, BGP, matches with pattern solution S on graph 
G with respect to a scoping graph G' just when the following three 
conditions all hold:

(1)  S(G' OrderedMerge BGP) is an RDF graph
(2)  G simply entails S(G' OrderedMerge BGP)
(3)  the bnodes introduced by S all occur in G'

Moreover, we require that any scoping graph G' must be 
graph-equivalent to G, and that a single scoping graph must be used 
for all answers to a single query.

<<Note, this last point will need to be firmed up by text elsewhere 
in the document.>>

<< This is Enrico's definition modified with the scoping graph, which 
I suggest is necessary to avoid requiring that engines deliver the 
actual bnodeIDs from the dataset in all answer bindings. Without 
this, we are in effect defining things so that told-bnodes are 
automatic. But in case, I think it is intuitive, as the actual role 
of G' isn't anything to do with entailment: it is semantically 
transparent. It is only to keep the bnodes properly in line in answer 
sets. >>

<< I think the document should have some prose explaining this 
definition, by the way, like why it needs the G' included in the 
consequent, what condition 3 is for anyway, and why S has to be 
applied 'after' the merge. This can all be illustrated with a single 
example. These are all unconventional, and the reasons for them are 
not obvious at first blush, and all have to do with bnodes, so will 
be new thinking for many database-savvy readers. I can try drafting 
this if nobody else wants to. >>

<< There are also a bunch of easy lemmas we could give, such as that 
this is equivalent to the instance/subgraph definition, and that rdf- 
and rdfs-SPARQL are equivalent to SPARQL applied to the closure 
graphs, which gives a nice opportunity also to point out why that 
won't work for OWL. Or maybe all this should be in a 'user tutorial' 
document :-) >>

Pat


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 18 January 2006 02:44:12 UTC