- From: Pat Hayes <phayes@ihmc.us>
- Date: Wed, 18 Jan 2006 17:15:27 -0600
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Here is a wording change to make the basic definitions clearer, and closer to an implementational specification. It doesn't change any of the actual content, only the way it is defined. I'm not meaning to be deliberately obstructive, this is based on the observation which just occurred to me this morning (while having my teeth worked on: you have to think about SOMETHING) that we have some leeway with specifying the scoping graph which we can utilize, in particular we can require that the scoping graph be standardized apart from the BGPs. So suppose it is. Then we can replace S(G' OrderedMerge BGP) by (G' union S(BGP)) in the definitions, which is clearer and more intuitive. This works because S can only insert bnodes from G', which cannot (under the above requirement) 'accidentally' collide with any bnodes in BGP: and it was only to avoid the possibility of such accidental bnode collisions (between, in that version, BGP and G) that we had a merge rather than a union in the first place; and then the "ordered merge" complication was needed to ensure that the merging didn't accidentally change any bnodeIDs in G', since we want the same G' to apply to all answers. Phew. So if make sure - as we can, by stipulation - that BGP and G' are standardized apart, then we don't need to merge, so we don't need to impose an ordering on it. Also, this makes it clear (by the use of union instead of merge) that the role of G' is to define the common scope of any bnodes introduced by S, so that all bnodes in answer bindings share a common scope. Altogether this seems to be all-round a better way to cast the definitions (the actual answer being defined is exactly the same, note.) So here they are, rewritten this way, with an initial expository note about the scoping graph and bnode scoping more generally: ------ When considering queries, we must pay special attention to the scope of blank nodes. Blank nodes may occur in the dataset graph, in a query pattern or patterns, and may also be delivered as bindings to variables in the query answer; these are called answer bnodes. These are three different, distinct, scopes for bnodeIDs, which must be kept separate from one another. Bnodes in the dataset are of course scoped to the dataset graph; bnodes in a query pattern are scoped to the pattern; bnodes occurring in answer bindings are scoped to the answer document which contains all the answers. In the formal definitions, we introduce a scoping graph to define this common scope for answer bnodes. Intuitively, this can be viewed as a 'virtual copy' of the dataset in which the bnodes have been replaced by new bnodes scoped to the answer document, which is required to not overlap any of the graph pattern scopes. Defining the notion of answer binding relative to a single scoping graph, which is understood to be held constant for all answers to a given SPARQL query, ensures that the answer bnodeIDs are used coherently relative to one another and to the IRIs and literals from the dataset graph, while allowing them to be scoped apart from bnodes in the dataset itself. <<An illustrative example would help here.>> Formally, a scoping graph for a dataset graph G and a query pattern Q is an RDF graph which is graph-equivalent to G and shares no bnodes with Q. Since we require that a single scoping graph be used to define all answers to a given query, it must share no bnodes with any part of any query pattern involved in the query. In practice, this can be achieved by 'standardizing apart' bnodes in query patterns from those in the dataset, or by a systematic re-writing of bnodeIDs. It is not usually necessary to actually construct a scoping graph. A basic graph pattern, BGP, matches with pattern solution S on graph G with respect to a scoping graph G' just when the following three conditions all hold: (1) G' union S(BGP) is an RDF graph (2) G simply entails (G' union S(BGP)) (3) the bnodes introduced by S all occur in G' Moreover, we require that a single scoping graph must be used for all answers to a single query. -------- <<That is for SPARQL itself. For the SPARQL extension prose, which I now think ought to come later in the document, probably right at the end, so can refer back to earlier discussions of scoping, etc., the following might work. The OWL-DL-Abox mods suggested by Enrico should be added to this:>> Given an entailment regime E, a basic graph pattern, BGP, E-matches with pattern solution S on graph G with respect to a scoping graph G' and a scoping set B, just when the following three conditions all hold: (1) (G' union S(BGP)) is an appropriately well-formed RDF graph for E-entailment (2) G E-entails (G' union S(BGP)) (3) the identifiers introduced by S all occur in B The scoping graph and scoping set must be identical for all answers to a query, and the scoping graph G' must be graph-equivalent to G and contain no bnodes in common with any basic graph pattern which is part of the query. Any querying language which uses the SPARQL syntax and protocols and satisfies all these conditions [may] be called a SPARQL extension. The specification of such an extension [must] provide conditions which determine the scoping set B so as to make the definition of E-matching unambiguous, and [must] be stated so as to apply uniformly to all datasets. SPARQL extensions [may] impose further conditions on scoping sets, and [may] further restrict the set of answers, for example to avoid redundancy in answer sets; but any answer [must] correspond to a pattern solution which satisfies the three conditions above. Any such extra conditions [must] be stated so as to apply uniformly to all datasets. The definition of the scoping set B is the major influence on how extensive and complex answers are allowed to be. It would be natural to require that B must contain at least every non-bnode term in G'. Restricting B to not contain bnodes means that the query language will properly handle queries over datasets represented by ground graphs. <<Enrico, is that right, or can you do better?>> Allowing B to contain vocabulary from fixed namespaces, in addition to that in G', can allow 'tautological' answers to queries. For example, it would be natural to require that B contain the vocabulary in the rdf: namespace when E is RDF-entailement, and similarly for the union of the rdf: and rdfs: namespaces when E is RDFS-entailment. <<This all gets a bit blurry, maybe better to just go ahead and define the RDF, RDFS and OWL_Abox cases and be done with it.>> <<This might be a good place to put a paragraph about the little-house example and OWL (?)>> Anyway, see what y'all think of that. We have until next Tuesday to argue about it :-) Pat PS. Bertrand Russell: "The method of "postulating" what we want has many advantages; they are the same as the advantages of theft over honest toil." -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Wednesday, 18 January 2006 23:15:41 UTC