RE: Optional triples and may-bind variables from Seaborne, Andy on 2004-05-13 (public-rdf-dawg@w3.org from April to June 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 13 May 2004 11:25:17 +0100
To: "'Rob Shearer'" <Rob.Shearer@networkinference.com>, "'RDF Data Access Working Group'" <public-rdf-dawg@w3.org>
Cc: "'Pat Hayes'" <phayes@ihmc.us>
Message-ID: <000f01c438d4$97340ed0$0a01a8c0@atlas>

-------- Original Message --------
> From: public-rdf-dawg-request@w3.org <>
> Date: 11 May 2004 18:33
> 
> There has been some discussion of so-called "may-bind"
> variables, and this discussion has been distilled into
> candidate requirement 3.6. My understanding all along was
> that it was Pay Hayes' implementation experience which was
> driving this, but it became clear to me during today's
> teleconference that what he has implemented and what this
> requirement addresses are two very very different things.

I agree - Pat's talk in the telecon brough home to me that they are
different.  Optional triples give rise to variables that may be unbound but
for different reasons.

> 
> Several logic-based query languages have the notion of
> "distinguished" and "undistinguished" variables.
> Distinguished variables are those for which a binding to a
> concrete name in the underlying data set must be established,
> while undistinguished variables don't necessarily need a
> concrete binding. Undistinguished variables, however, still
> must be known to exist: their relationships with the
> distinguished variables are still predicates which need to be
> fulfilled. My understanding of the DQL implementation is that
> "may-bind" variables simply allow the system to return a
> concrete binding for such an undistinguished variable if it happens to
> come across one. 
> 
> In plain RDF the details are quite straightforward:
> "undistinguished" variables may be matched against bnodes or
> regular nodes, while "distinguished" variables may only match
> against regular nodes. A "may-bind" variable would be a
> variable which can match against either bnodes or regular
> nodes, but which returns a binding in the result set if the
> match is against a regular node. Note that in this case the
> existence of bnodes very much affects the results of a query.
> 
> The way requirement 3.6 is written suggests that one may
> encode parts of the query which may be completely ignored if
> necessary--they don't impose any constraints at all. I read
> that requirement as the ability to say "get me all the
> people, and their email addresses if they have one, but it's
> all right if they don't have one".

Good example - that's exactly the sort of thing that is requested.   It is
one of the top three feature requests from users that I get.  The top 3 are:
optional triples, nested queries, distinct/sort/groups results (unordered
list); I would suggest these are expectations created by analogy with SQL.

>  That is quite contrary to
> my understanding of how Pat's implementation works, and I'm
> curious whether there is any implementation experience for
> such a feature.

Yes, there is.  SeRQL [1] implements optional triples.  The semantics are
that the mandatory matching is done first; then any optional path
expresssions are applied.  The approach copudl be added to RDQL or pattern
matching scheme.  The RDQL implementers (noying that Sesame imp,ents both
RDQL and SeRQL) had a email discussion about it awhile ago.

[[It makes a difference as to the evaluation order because the case of just
not even trying to match an optional triple is not allowed so the eval order
changes the flow of bound variable from one mandatory component to another.
There is also a test case was produced by Jeremy Carroll where two optional
triples share a variable which can't be bound consistently.]]

Outside the RDF world, SQL has OUTER joins which are similar although not
RDF. 

> 
> The feature effectively amounts to saying "P AND (Q OR
> true)", with the condition that you're not allowed to
> optimize Q away despite it being irrelevent. The implications
> of this feature for both the query language and
> implementations of that language are unclear and a bit worrying.

[1] http://www.openrdf.org/doc/users/ch05.html

	Andy


PS I have found the SQL tutorials at (there are many others)

http://www.1keydata.com/sql/
http://www.w3schools.com/sql/

very useful in recalling terminology and what SQL actually does.

Received on Thursday, 13 May 2004 06:29:19 UTC