Re: Blank node identifiers in FILTER clauses from Seaborne, Andy on 2006-07-05 (public-rdf-dawg@w3.org from July to September 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 05 Jul 2006 14:55:00 +0100
To: Fred Zemke <fred.zemke@oracle.com>
CC: public-rdf-dawg@w3.org
Message-ID: <44ABC4B4.7050000@hp.com>
Fred Zemke wrote:
> The scope of blank node identifiers is not clearly specified.

True - generally the text in 2.5 is the right one and text elsewhere (e.g.
2.8.3) reflects earlier work.

It's work explicitly talking about identifier scope in the syntax section on
blank nodes.  I think I can do that under the banner of "editorial".

> However, as I have understood conversations in email and
> telecon, the definition of basic graph pattern E-matching in
> 2.5.1 "General framework" provides the only definition for the
> semantics of blank node identifiers, and therefore
> the scope of a blank node identifier
> is a basic graph pattern.  My question is whether the scope
> can also extend into a Constraint in a FilteredBasicGraphPattern.
> 
> For example, consider the data set with these three triples:
> 
> <s1> <v> <o1> .
> <s2> <v> <o2a> .
> <s2> <v> <o2b> .
> 
> The user wants to find those subjects which are related via the
> verb <v> to at least two objects.  The desired solution
> sequence is { <s2> }.  The user writes his query this way:
> 
> SELECT ?x
> WHERE { ?x <v> _:a . ?x <v> _:b . FILTER (_:a != _:b) }

<o2a> and <o2b> may be names for the same object in the domain of discourse.
In general, it isn't possible to conclude anything about numbers of things in
RDF.  It is in OWL.

We have blank nodes in queries for two reasons:

1/ Syntax - because of the [] forms which are blank nodes in N3/Turtle.

2/ Handling OWL-disjunction (as the prototypical case).

  From (1), we need a treatment for blank nodes.  Some members of the working
group were interested in leaving (2) open and made a proposal.

(We could have deviated from N3/Turtle and make [] be anonymous named
variables (if you'll forgive the slight contradiction in terminology -
variables with a name but hidden from the user)

Named variables in solutions are bound to some RDF term if needed : blank
nodes are handled by entailment so (OWL disjunction) there are cases where
they are known to have a value, but not what that value is.  (Bijan's
undistinguished variables.)

This does not occur for RDFS entailments (or any entailment with a logical
closure so any rule-based entailment regime).

This is important because solutions (bindings of named variables and RDF
terms) flow through the graph operators.  Entailment only happens within a
basic graph pattern.  Only conjunctive triples patterns have any meaning for
entailment.

I find the alternative of relying on the presence or absence of a named
variable in the SELECT clause a very confusing  way of going about it - one
part of the syntax indirectly affects another part of the query.  It also does
not extend to queries with more than one BGP in them.

> 
> Does this do what the user wants?
> 
> It seems that the definitions in 2.5 "Basic graph patterns"
> only explain how to solve the basic graph pattern
> 
> ?x <v> _:a . ?x <v> _:b .
> 
> The solutions of this basic graph pattern are ?x = <s1>
> and ?x = <s2>.  In the case of ?x = <s1>, this is because
> the dataset entails the addition of these triples:
> 
> <s1> <v> _:a .
> <s1> <v> _:b .
> 
> or in predicate calculus terms, it is possible to conclude
> from the dataset that
> 
> (exists _:a, _:b) [ <s1> <v> _:a . <s1> <v> _b . ]
> 
> Or using the mapping technique for simple entailment, map
> ?x -> <s1>, _:a -> <o1>, _:b -> <o1> and then restrict to
> just the mapping of ?x.
> 
> Note that the definitions of section 2.5, using either
> entailment or mapping, do not provide for evaluating a
> Constraint during the process of finding solutions to a
> basic graph pattern.
> 
> So both solutions ?x -> <s1> and ?x -> <s2> come to the
> FILTER clause, and the FILTER clause is unaware of any bindings
> to _:a or _:b.  I do not know whether the result of
> FILTER (_:a != _:b) is true, false or error, but whatever
> the semantics of the FILTER clause is, it appears that it will
> treat the two solutions identically.  If true, then both
> <s1> and <s2> are solutions; if false or error, then neither
> are.  Thus the solution set appears to be either { <s1>, <s2> }
> or the empty set.  Not what was desired!
> 
> I see four possible resolutions:
> 
> 1. (My preference) the scope of a blank node identifier is
> an entire FilteredBasicGraphPattern, not just a basic graph
> pattern.  To do this, we need to extend the definitions in
> section 2.5 so that they define the solutions of a
> FilteredBasicGraphPattern rather than just the solutions of a
> basic graph pattern.  I can see how to do this with the
> simple entailment mapping definition; I don't see how to do
> this with the general E-entailment definition.

My preference as well.

I would remove the possibility of blank nodes (and general expressions) in the
functions isIRI/isLiteral/isBlank, restricting them to named variables only,
because these really work on the terms of the bindings, not the values.

I would like to see a proposal for (1) from one or more of the original
contributors of the current text (Enrico, Bijan, Pat).

> 
> 2. We prohibit blank node identifiers in FILTER clauses as
> inherently meaningless or deceptive syntax. 

OK - but less of a preference.  For me, this is a fall-back from (1) that we
can choose if we do not manage to get agreement around (1).

> 3. We allow blank node identifiers in FILTER clauses, but they
> always raise an error, so that such FILTERs always fail.
> But in that case, why did we permit the syntax?
> 
> 4. We allow blank node identifiers in FILTER clauses, and
> they reference distinct blank nodes, distinct from all blank
> nodes in the dataset.  Thus _:a = _:b is false, and _:a != _:b
> is true.

These two seem more confusing than (2).

> 
> Fred
> 

	Andy
Received on Wednesday, 5 July 2006 13:55:27 UTC