Blank node identifiers in FILTER clauses from Fred Zemke on 2006-06-28 (public-rdf-dawg@w3.org from April to June 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Tue, 27 Jun 2006 18:12:07 -0700
To: public-rdf-dawg@w3.org
Message-ID: <44A1D767.50709@oracle.com>
The scope of blank node identifiers is not clearly specified.
However, as I have understood conversations in email and
telecon, the definition of basic graph pattern E-matching in
2.5.1 "General framework" provides the only definition for the
semantics of blank node identifiers, and therefore
the scope of a blank node identifier
is a basic graph pattern.  My question is whether the scope
can also extend into a Constraint in a FilteredBasicGraphPattern.

For example, consider the data set with these three triples:

<s1> <v> <o1> .
<s2> <v> <o2a> .
<s2> <v> <o2b> .

The user wants to find those subjects which are related via the
verb <v> to at least two objects.  The desired solution
sequence is { <s2> }.  The user writes his query this way:

SELECT ?x
WHERE { ?x <v> _:a . ?x <v> _:b . FILTER (_:a != _:b) }

Does this do what the user wants?

It seems that the definitions in 2.5 "Basic graph patterns"
only explain how to solve the basic graph pattern

?x <v> _:a . ?x <v> _:b .

The solutions of this basic graph pattern are ?x = <s1>
and ?x = <s2>.  In the case of ?x = <s1>, this is because
the dataset entails the addition of these triples:

<s1> <v> _:a .
<s1> <v> _:b .

or in predicate calculus terms, it is possible to conclude
from the dataset that

(exists _:a, _:b) [ <s1> <v> _:a . <s1> <v> _b . ]

Or using the mapping technique for simple entailment, map
?x -> <s1>, _:a -> <o1>, _:b -> <o1> and then restrict to
just the mapping of ?x.

Note that the definitions of section 2.5, using either
entailment or mapping, do not provide for evaluating a
Constraint during the process of finding solutions to a
basic graph pattern.

So both solutions ?x -> <s1> and ?x -> <s2> come to the
FILTER clause, and the FILTER clause is unaware of any bindings
to _:a or _:b.  I do not know whether the result of
FILTER (_:a != _:b) is true, false or error, but whatever
the semantics of the FILTER clause is, it appears that it will
treat the two solutions identically.  If true, then both
<s1> and <s2> are solutions; if false or error, then neither
are.  Thus the solution set appears to be either { <s1>, <s2> }
or the empty set.  Not what was desired!

I see four possible resolutions:

1. (My preference) the scope of a blank node identifier is
an entire FilteredBasicGraphPattern, not just a basic graph
pattern.  To do this, we need to extend the definitions in
section 2.5 so that they define the solutions of a
FilteredBasicGraphPattern rather than just the solutions of a
basic graph pattern.  I can see how to do this with the
simple entailment mapping definition; I don't see how to do
this with the general E-entailment definition.

2. We prohibit blank node identifiers in FILTER clauses as
inherently meaningless or deceptive syntax. 

3. We allow blank node identifiers in FILTER clauses, but they
always raise an error, so that such FILTERs always fail.
But in that case, why did we permit the syntax?

4. We allow blank node identifiers in FILTER clauses, and
they reference distinct blank nodes, distinct from all blank
nodes in the dataset.  Thus _:a = _:b is false, and _:a != _:b
is true.

Fred
Received on Wednesday, 28 June 2006 01:12:20 UTC