- From: Pat Hayes <phayes@ihmc.us>
- Date: Tue, 8 Nov 2005 01:17:41 -0600
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
This is an attempt to summarize the issues/alternatives currently in
play regarding what is called the 'RDF semantics' issue. In fact,
this is turning out to not really be about semantics at all, but
about the scope of bnodes.
We could decide that bnodes are all locally scoped to graphs and to
queries, that a bnode in a query has the exact same semantics as a
pattern variable but does not produce an answer binding (or
alternatively, that bnodes are simply not used in patterns, only
variables), and that any bnodes bound to variables in an answer are
understood to be existentials scoped to that answer only, and have no
relationship to any bnodes in other queries, in the graph, or in
other answers. Call this the local option. This is what the spec says
now. Some comments have regretted this restriction:
http://lists.w3.org/Archives/Public/public-swbp-wg/2005Nov/0036
With the local option, we can define an answer as a substitution S such that
G entails Q[S]
where 'entails' is a parameter. Putting [entails] = [simply entails]
gives us the current design, but we can replace this with any other
entailment and the basic definitions should still work; and this is
so simple that we can make all the obvious points about RDFS
entailment being simple entailment form RDFS closures, etc.. And
just phrasing the definition in this way provides a smooth segway to
future extensions.
All the interest, however, lies in finding a way to allow bnodes to
be re-used between queries while keeping the intuitive clarity of
this 'semantic' definition. Some users seem to want another option,
call it the ID option. This is where a bnode is supplied as an answer
binding, and a subsequent query is allowed to ask for more
information about that thing, using the same identifier. Example:
Graph:
{:Arthur a Person
:Arthur :siblings _:l34
_:l34 type collection
_:l34 first _:P55
_:l34 rest _:l35
_:l35 first :Susan
_:l35 rest _:l36
_:l36 first :Bill
_:l36 rest nil
_:P55 a Person
_:P55 :gender male}
Query 1, to discover whether Arthur has any siblings:
:Arthur :siblings ?L
?L first ?V
In the local option should give, say,
L/_:a, V/_:b
where the nodeIDs are local to the answer. We can take this no
further: it amounts to the answer "yes", with no further details. To
get more, we have to compose a more elaborate new query.
In the ID option, the server can choose to provide meaningful
bnodeIDs as answer bindings:
L/_:l34, V/_:P55
allowing a subsequent query to ask for more information about these things, eg
Query 2:
_:P55 :gender ?S
_:l34 rest _:x
_:x first ?W
giving S/male, W/:Susan.
In order to do this, it seems we need to provide a way for the query
to distinguish bnodes which are intended to co-refer with bnodes in
earlier query bindings (and earlier queries) and those, like _:x in
query 2, which are not; for if the same understanding were applied to
_:x as to _:P55, then there would be no answer bindings for query 2.
Enrico's term is that _:l34 and _:P55 here are told-bnodes, while _:x
is just a bnode.
If we want to support some such mechanism, we need a way to
distinguish told-bnodes from mere bnodes. Several options have been
discussed.
(A) some universally agreed format is decided for bnodeIDs in answer
bindings which are intended to be useable as told-bnodes, and queries
distinguish told-bnodes by re-using the supplied bnodeID; any other
bnodes are simple bnodes. So for example, if the convention for a
told-bnode ID is that is has the prefix _:**, then the above example
would look like:
Query 1
:Arthur :siblings ?L
?L first ?V
answer
L/_:**l34, V/_:**P55
Query 2
_:**P55 :gender ?S
_:**l34 rest _:x
_:x first ?W
answer
S/male, W/:Susan
but for example
Query 3
_:**l34 rest _:**x
_:**x first ?V
answer
<none>
because there is no bnodeID '_:**x' in G
(B) a told-bnode is indicated by a query pattern variable with a
bnodeID 'attached' to it, indicating that the variable must be
replaced by that particular bnodeID. Other bnodes are simply bnodes.
Answer bindings are not supplied for these variables, as they would
be redundant. Then the example might be
Query 1
:Arthur :siblings ?L
?L first ?V
answer
L/_:l34, V/_:P55
Query 2
?X[_:P55] :gender ?S
?Z[_:l34] rest _:x
_:x first ?W
answer
S/male, W/:Susan
Notational variations are possible, eg a query variable whose name is
a bnodeID, so ?_:P55 rather than ?X[_:P55] to indicate a told-bnode.
(C) told-bnodes are indicated by a special kind of answer binding:
any told-bnode supplied by such a binding is a told-bnode in a
subsequent query, other bnodes aren't. So the example looks like
this, where [* *] is how the special answer bindings are indicated
(not a serious notational suggestion):
Query 1
:Arthur :siblings ?L
?L first ?V
answer
L/[*_:l34*], V/[*_:P55*]
Query 2
_:P55 :gender ?S
_:l34 rest _:x
_:x first ?W
answer
S/male, W/:Susan
The trouble with all of these is that they require the simple
semantic condition to be re-stated in a more complicated way. We have
to define (or tweak definitions so that things work out this way) a
new operation on graphs, something half-way between a merge and a
union. Merging graphs keeps all the bnodes separate: unioning them
does no separation at all. What we want is to union on the
told-bnodes but merge on the other bnodes. Call that semi-merging, or
smerging. Formally, G smerge H is the graph (G union H[S]), where S
is a 1:1 bnode-to-bnode mapping whose domain is the non-told-bnodes
in H and whose range is disjoint from the set of bnodes in G. Then
the semantic condition is
G entails (G smerge Q[S])
Of course this only makes sense when we have a clear definition of
what a told bnode is. If Q has no told-bnodes in it, then this is the
same as
G entails (G merge Q[S])
which is equivalent to the simple condition for the local option.
You can tweak things so as to make things work out without making
this new definition.
Enrico has done it (several times :-), but it seems complicated,
however you phrase it. One way is to describe it in terms of first
replacing the bnodeIDs with URIrefs, ie skolemizing the told-bnodes
in the graph and query, then applying entailment, then de-skolemizing
the answer bindings by replacing the skolemized URIrefs by bnodes
again. Blech. Another way is to re-state the basic condition as in
Enrico's document, as the condition
G entails (G merge Q)[S]
which means, when you unpack it, that if you take the bnodes in the
query, standardize them apart from those in the graph, and THEN apply
the answer binding (which leaves any bnodes that get introduced at
that stage alone, so they don't get standardized apart from the
bnodes in the graph) then what you get is entailed by the graph you
started with. This only works for the query style (B) above: you have
to tweak the definition differently for any other way of indicating
told bnodes.
But, I have to say, all this seems like overkill. The same effect can
be got by sticking to the local option, and requiring the server to
supplying a special URIref as an answer binding to indicate an
'anonymous' entity, along the lines suggested by Andy in
http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0159.html.
This corresponds directly to Enrico's original description of this as
a form of Skolemization, but doesnt require the ugly final step of
de-skolemizing to get back to a bnode. Then we don't need to invent a
new graph operation, we don't need any other new bnode conventions,
and we can stick with the simple, direct (and kind of obvious)
semantic definition of answer.
Since this is all about allowing names to be used with a wider scope
than a graph, and since the RDF model has until now consistently kept
bnode IDs as graph-local identifiers, the only sensible option seems
to be to provide a new kind of name. Logically there is very little
to choose between a name and an existential variable, i.e. a bnode,
in any case. The only difference is the scope. So inventing a URI
namespace to be the names of 'nameless things' seems like quite a
clever way around the difficulty, especially if some
easy-to-implement convention can be provided to map these IDs to and
from bnodeIDs.
This option has the great merit of semantic simplicity and not
requiring any special re-interpretation of bnodes or making any
special effort to re-define the notion of an answer binding. Also it
puts the control on the server side, which is where it belongs IMO,
since it doesn't make sense for a query to be able to use a bnodeID
as a told bnode in an initial query. On balance, I think this is the
best way to proceed at present. It means that semantically we can
stick to the local option, which keeps the description simple.
So, my vote is to reply to points like those expressed in
http://lists.w3.org/Archives/Public/public-swbp-wg/2005Nov/0036
by suggesting that 'fake URIs' be used to communicate between server
and client in cases like this, rather than 'fake variables' in (B)
above. Real live systems can adopt their own conventions for faking
the URIs, and still be considered conformant. They can even use real
bnodeIDs, I would suggest, but if they do then it is up to them to
keep track of the 'toldness' of these IDs somehow: they can't be
treated as normal bnodes, but must act 'locally' in such a query
regime as though they were URIrefs. They have to be 'frozen' into a
temporary URIref status as far as determining appropriate answers is
concerned. So as far as the semantics is concerned, we still only
have two classes of name: those with scope in a graph, and those with
a wide scope. Any scheme which requires distinguishing two 'kinds' of
bnode in a query amounts to having three categories of identifier in
an RDF graph rather than two, all with different scoping rules. This
muddles the semantics and will likely have knock-on effects
throughout the algebra. On mature reflection, I think that we do not
want to go there.
If we agree to this, then Enrico's ingenious work will have been
wasted, but those are the WG breaks :-). On the plus side, the RDF
semantics issue will then be trivially solvable since semantically
the entire transaction will fit within the local option.
Sorry this has taken so long. It needs a WG decision, however, before
we can move forwards. The semantics issue is hostage to the
told-bnode decision.
Pat
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Tuesday, 8 November 2005 07:18:21 UTC