- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 8 Nov 2005 01:17:41 -0600
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
This is an attempt to summarize the issues/alternatives currently in play regarding what is called the 'RDF semantics' issue. In fact, this is turning out to not really be about semantics at all, but about the scope of bnodes. We could decide that bnodes are all locally scoped to graphs and to queries, that a bnode in a query has the exact same semantics as a pattern variable but does not produce an answer binding (or alternatively, that bnodes are simply not used in patterns, only variables), and that any bnodes bound to variables in an answer are understood to be existentials scoped to that answer only, and have no relationship to any bnodes in other queries, in the graph, or in other answers. Call this the local option. This is what the spec says now. Some comments have regretted this restriction: http://lists.w3.org/Archives/Public/public-swbp-wg/2005Nov/0036 With the local option, we can define an answer as a substitution S such that G entails Q[S] where 'entails' is a parameter. Putting [entails] = [simply entails] gives us the current design, but we can replace this with any other entailment and the basic definitions should still work; and this is so simple that we can make all the obvious points about RDFS entailment being simple entailment form RDFS closures, etc.. And just phrasing the definition in this way provides a smooth segway to future extensions. All the interest, however, lies in finding a way to allow bnodes to be re-used between queries while keeping the intuitive clarity of this 'semantic' definition. Some users seem to want another option, call it the ID option. This is where a bnode is supplied as an answer binding, and a subsequent query is allowed to ask for more information about that thing, using the same identifier. Example: Graph: {:Arthur a Person :Arthur :siblings _:l34 _:l34 type collection _:l34 first _:P55 _:l34 rest _:l35 _:l35 first :Susan _:l35 rest _:l36 _:l36 first :Bill _:l36 rest nil _:P55 a Person _:P55 :gender male} Query 1, to discover whether Arthur has any siblings: :Arthur :siblings ?L ?L first ?V In the local option should give, say, L/_:a, V/_:b where the nodeIDs are local to the answer. We can take this no further: it amounts to the answer "yes", with no further details. To get more, we have to compose a more elaborate new query. In the ID option, the server can choose to provide meaningful bnodeIDs as answer bindings: L/_:l34, V/_:P55 allowing a subsequent query to ask for more information about these things, eg Query 2: _:P55 :gender ?S _:l34 rest _:x _:x first ?W giving S/male, W/:Susan. In order to do this, it seems we need to provide a way for the query to distinguish bnodes which are intended to co-refer with bnodes in earlier query bindings (and earlier queries) and those, like _:x in query 2, which are not; for if the same understanding were applied to _:x as to _:P55, then there would be no answer bindings for query 2. Enrico's term is that _:l34 and _:P55 here are told-bnodes, while _:x is just a bnode. If we want to support some such mechanism, we need a way to distinguish told-bnodes from mere bnodes. Several options have been discussed. (A) some universally agreed format is decided for bnodeIDs in answer bindings which are intended to be useable as told-bnodes, and queries distinguish told-bnodes by re-using the supplied bnodeID; any other bnodes are simple bnodes. So for example, if the convention for a told-bnode ID is that is has the prefix _:**, then the above example would look like: Query 1 :Arthur :siblings ?L ?L first ?V answer L/_:**l34, V/_:**P55 Query 2 _:**P55 :gender ?S _:**l34 rest _:x _:x first ?W answer S/male, W/:Susan but for example Query 3 _:**l34 rest _:**x _:**x first ?V answer <none> because there is no bnodeID '_:**x' in G (B) a told-bnode is indicated by a query pattern variable with a bnodeID 'attached' to it, indicating that the variable must be replaced by that particular bnodeID. Other bnodes are simply bnodes. Answer bindings are not supplied for these variables, as they would be redundant. Then the example might be Query 1 :Arthur :siblings ?L ?L first ?V answer L/_:l34, V/_:P55 Query 2 ?X[_:P55] :gender ?S ?Z[_:l34] rest _:x _:x first ?W answer S/male, W/:Susan Notational variations are possible, eg a query variable whose name is a bnodeID, so ?_:P55 rather than ?X[_:P55] to indicate a told-bnode. (C) told-bnodes are indicated by a special kind of answer binding: any told-bnode supplied by such a binding is a told-bnode in a subsequent query, other bnodes aren't. So the example looks like this, where [* *] is how the special answer bindings are indicated (not a serious notational suggestion): Query 1 :Arthur :siblings ?L ?L first ?V answer L/[*_:l34*], V/[*_:P55*] Query 2 _:P55 :gender ?S _:l34 rest _:x _:x first ?W answer S/male, W/:Susan The trouble with all of these is that they require the simple semantic condition to be re-stated in a more complicated way. We have to define (or tweak definitions so that things work out this way) a new operation on graphs, something half-way between a merge and a union. Merging graphs keeps all the bnodes separate: unioning them does no separation at all. What we want is to union on the told-bnodes but merge on the other bnodes. Call that semi-merging, or smerging. Formally, G smerge H is the graph (G union H[S]), where S is a 1:1 bnode-to-bnode mapping whose domain is the non-told-bnodes in H and whose range is disjoint from the set of bnodes in G. Then the semantic condition is G entails (G smerge Q[S]) Of course this only makes sense when we have a clear definition of what a told bnode is. If Q has no told-bnodes in it, then this is the same as G entails (G merge Q[S]) which is equivalent to the simple condition for the local option. You can tweak things so as to make things work out without making this new definition. Enrico has done it (several times :-), but it seems complicated, however you phrase it. One way is to describe it in terms of first replacing the bnodeIDs with URIrefs, ie skolemizing the told-bnodes in the graph and query, then applying entailment, then de-skolemizing the answer bindings by replacing the skolemized URIrefs by bnodes again. Blech. Another way is to re-state the basic condition as in Enrico's document, as the condition G entails (G merge Q)[S] which means, when you unpack it, that if you take the bnodes in the query, standardize them apart from those in the graph, and THEN apply the answer binding (which leaves any bnodes that get introduced at that stage alone, so they don't get standardized apart from the bnodes in the graph) then what you get is entailed by the graph you started with. This only works for the query style (B) above: you have to tweak the definition differently for any other way of indicating told bnodes. But, I have to say, all this seems like overkill. The same effect can be got by sticking to the local option, and requiring the server to supplying a special URIref as an answer binding to indicate an 'anonymous' entity, along the lines suggested by Andy in http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0159.html. This corresponds directly to Enrico's original description of this as a form of Skolemization, but doesnt require the ugly final step of de-skolemizing to get back to a bnode. Then we don't need to invent a new graph operation, we don't need any other new bnode conventions, and we can stick with the simple, direct (and kind of obvious) semantic definition of answer. Since this is all about allowing names to be used with a wider scope than a graph, and since the RDF model has until now consistently kept bnode IDs as graph-local identifiers, the only sensible option seems to be to provide a new kind of name. Logically there is very little to choose between a name and an existential variable, i.e. a bnode, in any case. The only difference is the scope. So inventing a URI namespace to be the names of 'nameless things' seems like quite a clever way around the difficulty, especially if some easy-to-implement convention can be provided to map these IDs to and from bnodeIDs. This option has the great merit of semantic simplicity and not requiring any special re-interpretation of bnodes or making any special effort to re-define the notion of an answer binding. Also it puts the control on the server side, which is where it belongs IMO, since it doesn't make sense for a query to be able to use a bnodeID as a told bnode in an initial query. On balance, I think this is the best way to proceed at present. It means that semantically we can stick to the local option, which keeps the description simple. So, my vote is to reply to points like those expressed in http://lists.w3.org/Archives/Public/public-swbp-wg/2005Nov/0036 by suggesting that 'fake URIs' be used to communicate between server and client in cases like this, rather than 'fake variables' in (B) above. Real live systems can adopt their own conventions for faking the URIs, and still be considered conformant. They can even use real bnodeIDs, I would suggest, but if they do then it is up to them to keep track of the 'toldness' of these IDs somehow: they can't be treated as normal bnodes, but must act 'locally' in such a query regime as though they were URIrefs. They have to be 'frozen' into a temporary URIref status as far as determining appropriate answers is concerned. So as far as the semantics is concerned, we still only have two classes of name: those with scope in a graph, and those with a wide scope. Any scheme which requires distinguishing two 'kinds' of bnode in a query amounts to having three categories of identifier in an RDF graph rather than two, all with different scoping rules. This muddles the semantics and will likely have knock-on effects throughout the algebra. On mature reflection, I think that we do not want to go there. If we agree to this, then Enrico's ingenious work will have been wasted, but those are the WG breaks :-). On the plus side, the RDF semantics issue will then be trivially solvable since semantically the entire transaction will fit within the local option. Sorry this has taken so long. It needs a WG decision, however, before we can move forwards. The semantics issue is hostage to the told-bnode decision. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 cell phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 8 November 2005 07:18:21 UTC