Re: Blank node identifiers in FILTER clauses from Eric Prud'hommeaux on 2006-07-02 (public-rdf-dawg@w3.org from July to September 2006)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sun, 2 Jul 2006 14:01:17 +0200
To: Fred Zemke <fred.zemke@oracle.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <20060702120117.GB14356@w3.org>
In order to make this even more concrete for readers, I have grounded
this use case in FOAF. (It helps me keep the variables and bindings
less arbitrary.)

On Tue, Jun 27, 2006 at 06:12:07PM -0700, Fred Zemke wrote:
> 
> The scope of blank node identifiers is not clearly specified.
> However, as I have understood conversations in email and
> telecon, the definition of basic graph pattern E-matching in
> 2.5.1 "General framework" provides the only definition for the
> semantics of blank node identifiers, and therefore
> the scope of a blank node identifier
> is a basic graph pattern.  My question is whether the scope
> can also extend into a Constraint in a FilteredBasicGraphPattern.
> 
> For example, consider the data set with these three triples:
> 
> <s1> <v> <o1> .
> <s2> <v> <o2a> .
> <s2> <v> <o2b> .

<EricP> foaf:knows <SteveH> .
<FredZ> foaf:knows <AndyS> .
<FredZ> foaf:knows <PatH> .

> The user wants to find those subjects which are related via the
> verb <v> to at least two objects.  The desired solution
> sequence is { <s2> }.  The user writes his query this way:
> 
> SELECT ?x
> WHERE { ?x <v> _:a . ?x <v> _:b . FILTER (_:a != _:b) }

Find everyone who knows two distinct people (or at least, symbols for
people):

SELECT ?who
WHERE { ?who foaf:knows _:known1 . 
	?who foaf:knows _:known2 .
	FILTER (_:known1 != _:known2) }

> Does this do what the user wants?
> 
> It seems that the definitions in 2.5 "Basic graph patterns"
> only explain how to solve the basic graph pattern
> 
> ?x <v> _:a . ?x <v> _:b .

?who foaf:knows _:known1 . ?who foaf:knows _:known2 .

> The solutions of this basic graph pattern are ?x = <s1>
> and ?x = <s2>.  In the case of ?x = <s1>, this is because
> the dataset entails the addition of these triples:
> 
> <s1> <v> _:a .
> <s1> <v> _:b .

In fact, before the FILTER, I think we end up with a bigger
SolutionSet (new word to signify a pre-projection ancestor of the
ResultSet).

# |  ?who   | _:known1 | _:known2 |
1 | <EricP> | <SteveH> | <SteveH> |
2 | <FredZ> |  <AndyS> |  <AndyS> |
3 | <FredZ> |  <AndyS> |   <PatH> |   * proves result { <FredZ> }
4 | <FredZ> |   <PatH> |  <AndyS> |   * proves result { <FredZ> }
5 | <FredZ> |   <PatH> |   <PatH> |

where rows 1,2,5 are eliminated in the evaluation of the encompassing
FilteredBasicGraphPattern:
  <http://www.w3.org/TR/rdf-sparql-query/#rFilteredBasicGraphPattern>

> or in predicate calculus terms, it is possible to conclude
> from the dataset that
> 
> (exists _:a, _:b) [ <s1> <v> _:a . <s1> <v> _b . ]

(exists _:known1, _:known2) [ <FredZ> foaf:knows _:known1 . 
			      <FredZ> foaf:knows _:knonw2 . ]

(You probably intend that we have, by now, executed the FILTER; from
 just the Basic Graph Pattern,

 (exists _:known1, _:known2) [ <EricP> foaf:knows _:known1 . 
			       <EricP> foaf:knows _:known2 . ]
 is also true.
)

> Or using the mapping technique for simple entailment, map
> ?x -> <s1>, _:a -> <o1>, _:b -> <o1> and then restrict to
> just the mapping of ?x.
> 
> Note that the definitions of section 2.5, using either
> entailment or mapping, do not provide for evaluating a
> Constraint during the process of finding solutions to a
> basic graph pattern.
> 
> So both solutions ?x -> <s1> and ?x -> <s2> come to the
> FILTER clause, and the FILTER clause is unaware of any bindings
> to _:a or _:b.  I do not know whether the result of
> FILTER (_:a != _:b) is true, false or error, but whatever
> the semantics of the FILTER clause is, it appears that it will
> treat the two solutions identically.  If true, then both
> <s1> and <s2> are solutions; if false or error, then neither
> are.  Thus the solution set appears to be either { <s1>, <s2> }
> or the empty set.  Not what was desired!

Here you have exceeded my moth-like attention span. Could you point
out how this is different from the same query with variables instead
of bNodes?

SELECT ?who
WHERE { ?who foaf:knows ?known1 . 
	?who foaf:knows ?known2 .
	FILTER (?known1 != ?known2) }

Specifically, why are the bindings to undistinguished variables out of
scope in the FILTER while the bindings to distinguished variables are
not? Is it that 2.5.4 says that the bNodes are scoped to the basic
graph pattern:

[[ <http://www.w3.org/TR/rdf-sparql-query/#BGPsyntax>
with the scope of the blank node label being the basic graph pattern.
]]

while the label is scoped to the query:

[[ <http://www.w3.org/TR/rdf-sparql-query/#blankNodes?
Blank nodes have labels which are scoped to the query.
]]

(as are variables:

[[ <http://www.w3.org/TR/rdf-sparql-query/#QSynVariables>
Variables in SPARQL queries have global scope;
]]

)? (Acutally, I suspect that your issue comes from the definitions as
you tend to look for trouble there.)

> I see four possible resolutions:
> 
> 1. (My preference) the scope of a blank node identifier is
> an entire FilteredBasicGraphPattern, not just a basic graph
> pattern.  To do this, we need to extend the definitions in
> section 2.5 so that they define the solutions of a
> FilteredBasicGraphPattern rather than just the solutions of a
> basic graph pattern.  I can see how to do this with the
> simple entailment mapping definition; I don't see how to do
> this with the general E-entailment definition.

I think the scope of a bNode identifier is already the entire query
(obviating the need for text like "scoped to the *outermost*
FilteredBasicGraphPattern for situations like:
  WHERE { ?who foaf:knows ?known1 . 
	  OPTIONAL { ?who foaf:knows ?known2 } .
	  FILTER (?known1 != ?known2) }
).

I hope we can resolve this by updating 2.5.4 to be
[[
with the scope of the blank node label being the *query*.
]]
and that the semantics of RDF will say that
  { GRAPH <foo> { _:s ?p1 ?o1 } .
    GRAPH <bar> { _:s ?p2 ?o2 } }
tell us that symbol space for bNodes in <foo> is distinct from that in
<bar>, thus yielding no solutions.

> 2. We prohibit blank node identifiers in FILTER clauses as
> inherently meaningless or deceptive syntax. 
> 
> 3. We allow blank node identifiers in FILTER clauses, but they
> always raise an error, so that such FILTERs always fail.
> But in that case, why did we permit the syntax?

I think the difference between 2 and 3 is simply whether we complicate
the grammar or put extra text in that says "It is an error to place a
bNode in a FILTER expression". I propose that 2 and 3 are basically
the same for purposes of this decision and that the execution is a
matter editorial taste.

> 4. We allow blank node identifiers in FILTER clauses, and
> they reference distinct blank nodes, distinct from all blank
> nodes in the dataset.  Thus _:a = _:b is false, and _:a != _:b
> is true.

This seems like a non-starter, but possibly just because I don't get
it. I would expect this insular bNode scope would have the most value
in extension functions where you want to pass an *existable* as an
argument. For the simple cases you describe above, I would interpret
the expressions as
  _:a = _:b => "There is at least one thing in the domain of
	        discourse" which I true except for the dataset { }.
  _:a != _:b => "There is are least two things in the domain of
	        discourse"
This use of bNodes seems theoretically entertaining, but not useful.

-- 
-eric

home-office: +1.617.395.1213 (usually 900-2300 CET)
cell:        +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Sunday, 2 July 2006 12:00:00 UTC