Wording change (was : Re: Final text for Basic Graph Patterns) from Pat Hayes on 2006-01-18 (public-rdf-dawg@w3.org from January to March 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 18 Jan 2006 17:15:27 -0600
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230906bff4383c1442@[10.100.0.23]>
Here is a wording change to make the basic definitions clearer, and 
closer to an implementational specification. It doesn't change any of 
the actual content, only the way it is defined.

I'm not meaning to be deliberately obstructive, this is based on the 
observation which just occurred to me this morning (while having my 
teeth worked on: you have to think about SOMETHING) that we have some 
leeway with specifying the scoping graph which we can utilize, in 
particular we can require that the scoping graph be standardized 
apart from the BGPs. So suppose it is. Then we can replace

S(G' OrderedMerge BGP)

by

(G' union S(BGP))

in the definitions, which is clearer and more intuitive.

This works because S can only insert bnodes from G', which cannot 
(under the above requirement) 'accidentally' collide with any bnodes 
in BGP: and it was only to avoid the possibility of such accidental 
bnode collisions (between, in that version, BGP and G) that we had a 
merge rather than a union in the first place; and then the "ordered 
merge" complication was needed to ensure that the merging didn't 
accidentally change any bnodeIDs in G', since we want the same G' to 
apply to all answers. Phew. So if make sure - as we can, by 
stipulation - that BGP and G' are standardized apart, then we don't 
need to merge, so we don't need to impose an ordering on it.

Also, this makes it clear (by the use of union instead of merge) that 
the role of G' is to define the common scope of any bnodes introduced 
by S, so that all bnodes in answer bindings share a common scope.

Altogether this seems to be all-round a better way to cast the 
definitions (the actual answer being defined is exactly the same, 
note.)  So here they are, rewritten this way, with an initial 
expository note about the scoping graph and bnode scoping more 
generally:

------

When considering queries, we must pay special attention to the scope 
of blank nodes. Blank nodes may occur in the dataset graph, in a 
query pattern or patterns, and may also be delivered as bindings to 
variables in the query answer; these are called answer bnodes. These 
are three different, distinct, scopes for bnodeIDs, which must be 
kept separate from one another. Bnodes in the dataset are of course 
scoped to the dataset graph; bnodes in a query pattern are scoped to 
the pattern; bnodes occurring in answer bindings are scoped to the 
answer document which contains all the answers. In the formal 
definitions, we introduce a scoping graph to define this common scope 
for answer bnodes. Intuitively, this can be viewed as a 'virtual 
copy' of the dataset in which the bnodes have been replaced by new 
bnodes scoped to the answer document, which is required to not 
overlap any of the graph pattern scopes. Defining the notion of 
answer binding relative to a single scoping graph, which is 
understood to be held constant for all answers to a given SPARQL 
query, ensures that the answer bnodeIDs are used coherently relative 
to one another and to the IRIs and literals from the dataset graph, 
while allowing them to be scoped apart from bnodes in the dataset 
itself.

<<An illustrative example would help here.>>

Formally, a scoping graph for a dataset graph G and a query pattern Q 
is an RDF graph which is graph-equivalent to G and shares no bnodes 
with Q. Since we require that a single scoping graph be used to 
define all answers to a given query, it must share no bnodes with any 
part of any query pattern involved in the query. In practice, this 
can be achieved by 'standardizing apart' bnodes in query patterns 
from those in the dataset, or by a systematic re-writing of bnodeIDs. 
It is not usually necessary to actually construct a scoping graph.

A basic graph pattern, BGP, matches with pattern solution S on graph 
G with respect to a scoping graph G' just when the following three 
conditions all hold:

(1)  G' union S(BGP) is an RDF graph
(2)  G simply entails (G' union S(BGP))
(3)  the bnodes introduced by S all occur in G'

Moreover, we require that a single scoping graph must be used for all 
answers to a single query.

--------

<<That is for SPARQL itself. For the SPARQL extension prose, which I 
now think ought to come later in the document, probably right at the 
end, so can refer back to earlier discussions of scoping, etc., the 
following might work. The OWL-DL-Abox mods suggested by Enrico should 
be added to this:>>

Given an entailment regime E, a basic graph pattern, BGP, E-matches 
with pattern solution S on graph G with respect to a scoping graph G' 
and a scoping set B, just when the following three conditions all 
hold:

(1)  (G' union S(BGP)) is an appropriately well-formed RDF graph for 
E-entailment
(2)  G E-entails (G' union S(BGP))
(3)  the identifiers introduced by S all occur in B

The scoping graph and scoping set must be identical for all answers 
to a query, and the scoping graph G' must be graph-equivalent to G 
and contain no bnodes in common with any basic graph pattern which is 
part of the query.

Any querying language which uses the SPARQL syntax and protocols and 
satisfies all these conditions [may] be called a SPARQL extension. 
The specification of such an extension [must] provide conditions 
which determine the scoping set B so as to make the definition of 
E-matching unambiguous, and [must] be stated so as to apply uniformly 
to all datasets. SPARQL extensions [may] impose further conditions on 
scoping sets, and [may] further restrict the set of answers, for 
example to avoid redundancy in answer sets; but any answer [must] 
correspond to a pattern solution which satisfies the three conditions 
above. Any such extra conditions [must] be stated so as to apply 
uniformly to all datasets.

The definition of the scoping set B is the major influence on how 
extensive and complex answers are allowed to be. It would be natural 
to require that B must contain at least every non-bnode term in G'. 
Restricting B to not contain bnodes means that the query language 
will properly handle queries over datasets represented by ground 
graphs. <<Enrico, is that right, or can you do better?>> Allowing B 
to contain vocabulary from fixed namespaces, in addition to that in 
G', can allow 'tautological' answers to queries. For example, it 
would be natural to require that B contain the vocabulary in the rdf: 
namespace when E is RDF-entailement, and similarly for the union of 
the rdf: and rdfs: namespaces when E is RDFS-entailment. <<This all 
gets a bit blurry, maybe better to just go ahead and define the RDF, 
RDFS and OWL_Abox cases and be done with it.>>

<<This might be a good place to put a paragraph about the 
little-house example and OWL (?)>>

Anyway, see what y'all think of that. We have until next Tuesday to 
argue about it :-)

Pat

PS. Bertrand Russell: "The method of "postulating" what we want has 
many advantages; they are the same as the advantages of theft over 
honest toil."


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 18 January 2006 23:15:41 UTC