Re: Final text for Basic Graph Patterns from Pat Hayes on 2006-01-18 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 18 Jan 2006 17:28:21 -0600
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p0623090fbff44b969d54@[10.100.0.23]>
>On 18 Jan 2006, at 03:44, Pat Hayes wrote:
>
>>Given an entailment regime E, a basic graph pattern, BGP, E-matches 
>>with pattern solution S on graph G with respect to a scoping graph 
>>G' and a scoping set B, just when the following three conditions 
>>all hold:
>>
>>(1)  S(G' OrderedMerge BGP) is an appropriately well-formed RDF 
>>graph for E-entailment
>>(2)  G E-entails S(G' OrderedMerge BGP)
>>(3)  the identifiers introduced by S all occur in B.
>>
>>Several conditions must be met by the scoping graph and scoping 
>>set. The scoping graph and scoping set must be identical for all 
>>answers to a query; the scoping graph G' must be graph-equivalent 
>>to G; and B must contain every term in G'.
>
>OK, *now* you are starting to converge to us :-)
>
>Still, as I said, I would leave G outside the above semantic definition,

Did you mean G'?

>since the above is *completely* equivalent if we replace (2) with:
>(2)  G' E-entails S(G' OrderedMerge BGP)
>
>As I already said in 
><http://lists.w3.org/Archives/Public/public-rdf-dawg/2006JanMar/0137>, 
>and you also said:
>
>>Moreover, we require that any scoping graph G' must be 
>>graph-equivalent to G, and that a single scoping graph must be used 
>>for all answers to a single query.
>>
>><<Note, this last point will need to be firmed up by text elsewhere 
>>in the document.>>
>
>the definition of G' wrt G should be stated at the beginning of any 
>processing of the SPARQL server, just to be sure that G' is the same 
>for any kind of processing within the server.

The order in which the definitions are stated in the document is best 
decided on grounds of clarity, and has no implications for the order 
in which anything is processed.

>And also *very* important for us, B should be left free to *not* 
>include any bnode name.

Yes, I now understand this point, and agree that this should be a 
viable option. This requires slightly relaxing the "B contain every 
term in G" condition. In fact I suggest that this should be omitted 
altogether in the general definition, so that there are no general 
conditions on B that [must] be satisfied by every extension, but text 
added to indicate that it would be normal for B to contain every 
non-bnode term in G.

>This would cover the only current formal understanding of OWL-DL 
>SPARQL, where answers do not include bnodes, and the syntax of the 
>BGPs is restricted in an analogous way OWL-DL expressions are (i.e., 
>what we have defined as non-high order RDF graphs in [1]). Note that 
>our current text also says something about the necessary 
>restrictions on BGPs:
>
>"where the syntactic restrictions in OWL-DL or OWL-Lite should be 
>reflected in suitable syntactic restrictions on the form of basic 
>graph patterns"

Yes, this is what inspired the wording in condition (1) above. We 
could keep also your text as exposition of the same point.

>><< The only point we are leaving open, really, is exactly how to 
>>define the scoping vocabulary B for OWL. I remain concerned that 
>>this may have to be allowed to contain enough vocabulary to 
>>construct OWL syntax using RDF collections.>>
>
>As you can understand, this is exactly what we don't want to leave 
>open. And your current proposal still wouldn't allow for it, but I 
>like the idea of having the scoping set B. So, we could have several 
>meaningful kinds of Bs

That is what I had in mind, yes.

>, which may include an arbitrary combination of the following sets 
>(as I see it now):
>- all URIs excluding RDF/RDFS/OWL vocabularies
>- the RDF/RDFS/OWL vocabularies
>- the bnodes names
>- the terms in G'
>
>For simple entailment, you have that B is just the terms in G'.
>If you want told bnode simple entailment, you have in addition that 
>G is equal to G'.
>For standard OWL-DL entailment, B contains only all URIs excluding 
>RDF/RDFS/OWL vocabularies (and a restriction on the syntax of BGPs).

Surely it can contain literals in G also, no? The wellformedness 
conditions in (1) should keep the instances DL-safe.

>><< This is Enrico's definition modified with the scoping graph, 
>>which I suggest is necessary to avoid requiring that engines 
>>deliver the actual bnodeIDs from the dataset in all answer 
>>bindings. Without this, we are in effect defining things so that 
>>told-bnodes are automatic. But in case, I think it is intuitive, as 
>>the actual role of G' isn't anything to do with entailment: it is 
>>semantically transparent. It is only to keep the bnodes properly in 
>>line in answer sets. >>
>
>I understand this, with the proviso that we move away the 
>introduction of G' from the semantic definition.

I'm not sure what you mean. The definition of answer binding 
certainly needs to refer to the binding graph. It might make 
pedagogic sense to introduce the idea earlier in the document, but 
that is an editorial decision.

>And I *insist* that in the normative standard we have the 
>equivalence to subgraph matching by deafult: this happens if G=G'.

We cannot have "defaults" in the definitions. The issue is whether or 
not we require answer bnode IDs to be drawn from the actual dataset 
graph, which is what one gets if G'=G. Given that we have decided 
against allowing told bnodes, I strongly suggest we do not do this.

>So, if you insist to have separate G and G', I insist that the 
>default should be having G=G' (which guarantees the equivalence of 
>the semantic definition with the implementations using subgraph 
>matching).

Well, I don't think that there can be a default in the SPARQL spec, 
speaking strictly, so the issue is whether we should require that 
G=G', and I feel strongly that we should not require it.

>  SPARQL may be extended (by enhancing the protocol) to allow servers 
>to declare that they do not guarantee G=G'.

Tinkering with the protocol is not acceptable: in any case, we have 
already decided that the scope of bnodes in answers is to the answer 
set for the query, not larger. Requiring G'=G would effectively put 
all such answer bnodes in any answer set from a given dataset into 
the same scope (that of the dataset), imposing told-bnode behavior as 
a requirement (or default), which is not appropriate. More generally, 
it seems sensible that servers should be free to use generated 
bnodeIDs in answers.

>
>So, the new proposed central part of the text would be now:
>
>"""
>Definition: Scoping Set.
>A scoping set B is a set of terms formed by a combination of:
>     - all URIs excluding RDF/RDFS/OWL vocabularies;
>     - the RDF/RDFS/OWL vocabularies;
>     - the bnodes names.
>
>Definition: Basic Graph Pattern matching.
>A Basic Graph Pattern is a set of triple patterns.
>Given an entailment regime E, a basic graph pattern, BGP, E-matches 
>with pattern solution S on graph G with respect to a scoping set B, 
>if:
>     - S( G OrderedMerge BG P) is an appropriately well-formed RDF 
>graph for E-entailment;
>     - G  E-entails  S(G OrderedMerge BGP);
>     - the identifiers introduced by S all occur in B.
>
>The default normative choices in SPARQL are:
>(a) E-entailment is simple entailment (as defined in [RDF-MT]);
>(b) B is restricted to the terms in G.
>
>These default choices allow for the basic operation of querying the 
>"syntax" of RDF graphs, completely neglecting its semantics.

I do not want to have this wording in the document. Simple entailment 
does not 'completely neglect' semantics: it is just a very simple, 
basic, form of semantic entailment.

>  In this way, the basic option for SPARQL is to manipulate graphs, 
>rather than involving reasoning on knowledge bases

Checking the relationship of {:a :p :b} to {_:x :p :b} is a form of 
reasoning, albeit a very simple one.  All of this business of bnode 
scoping is to do with (a very simple kind of) reasoning. I think the 
contrast you imply here is misleading. In any case, the spec does not 
need this kind of justification text.

>; the latter may be possible by choosing another form of 
>E-entailment. In fact SPARQL may be extended to provide a way to 
>override the default

I don't think we should talk of "defaults" and "overriding". This 
does not read like proper specification language, which should be as 
unequivocal as possible. SPARQL is defined to be an actual language 
in the spec, and its definition should be clear and unambiguous. 
These other, closely related, options are not SPARQL, they are 
possible extensions to SPARQL, and are distinct from SPARQL itself, 
which  is clearly defined and unambiguous. There are no defaults in 
this picture, any more than RDF is a default for RDFS.

>"simple entailment" with "RDF entailment", "RDFS entailment" (as 
>defined in [RDF-MT]) by releasing the restriction on B, or with "OWL 
>entailment" (as defined in [OWL-Semantics]) where B should be 
>restricted to include only URIs and the syntactic restrictions in 
>OWL-DL or OWL-Lite should be reflected in suitable syntactic 
>restrictions on the form of basic graph patterns.
>"""
>
>Is this of your satisfaction? :-)

Not quite, see above. :-)

Pat

>
>cheers
>--e.
>
>[1] Jos de Bruijn, Enrico Franconi, Sergio Tessaris (2005). Logical 
>Reconstruction of normative RDF. Proc. of the Workshosp on OWL 
>Experiences and Directions (OWLED 2005), Galway, Ireland, November 
>2005. <http://www.inf.unibz.it/~franconi/papers/owled-05.pdf>


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 18 January 2006 23:28:44 UTC