Optional triples and may-bind variables from Rob Shearer on 2004-05-11 (public-rdf-dawg@w3.org from April to June 2004)

From: Rob Shearer <Rob.Shearer@networkinference.com>
Date: Tue, 11 May 2004 10:33:01 -0700
To: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Cc: "Pat Hayes" <phayes@ihmc.us>
Message-ID: <CFE388CECDDB1E43AB1F60136BEB49730280B6@rome.ad.networkinference.com>

There has been some discussion of so-called "may-bind" variables, and
this discussion has been distilled into candidate requirement 3.6. My
understanding all along was that it was Pay Hayes' implementation
experience which was driving this, but it became clear to me during
today's teleconference that what he has implemented and what this
requirement addresses are two very very different things.

Several logic-based query languages have the notion of "distinguished"
and "undistinguished" variables. Distinguished variables are those for
which a binding to a concrete name in the underlying data set must be
established, while undistinguished variables don't necessarily need a
concrete binding. Undistinguished variables, however, still must be
known to exist: their relationships with the distinguished variables are
still predicates which need to be fulfilled. My understanding of the DQL
implementation is that "may-bind" variables simply allow the system to
return a concrete binding for such an undistinguished variable if it
happens to come across one.

In plain RDF the details are quite straightforward: "undistinguished"
variables may be matched against bnodes or regular nodes, while
"distinguished" variables may only match against regular nodes. A
"may-bind" variable would be a variable which can match against either
bnodes or regular nodes, but which returns a binding in the result set
if the match is against a regular node. Note that in this case the
existence of bnodes very much affects the results of a query.

The way requirement 3.6 is written suggests that one may encode parts of
the query which may be completely ignored if necessary--they don't
impose any constraints at all. I read that requirement as the ability to
say "get me all the people, and their email addresses if they have one,
but it's all right if they don't have one". That is quite contrary to my
understanding of how Pat's implementation works, and I'm curious whether
there is any implementation experience for such a feature.

The feature effectively amounts to saying "P AND (Q OR true)", with the
condition that you're not allowed to optimize Q away despite it being
irrelevent.
The implications of this feature for both the query language and
implementations of that language are unclear and a bit worrying.

Received on Tuesday, 11 May 2004 13:34:48 UTC