Re: Blank node identifiers in FILTER clauses from Eric Prud'hommeaux on 2006-07-13 (public-rdf-dawg@w3.org from July to September 2006)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 13 Jul 2006 18:23:42 +0200
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: Fred Zemke <fred.zemke@oracle.com>, public-rdf-dawg@w3.org
Message-ID: <20060713162342.GA12357@w3.org>
On Wed, Jul 05, 2006 at 02:55:00PM +0100, Seaborne, Andy wrote:
> 
> Fred Zemke wrote:
> >The scope of blank node identifiers is not clearly specified.
> 
> True - generally the text in 2.5 is the right one and text elsewhere (e.g.
> 2.8.3) reflects earlier work.
> 
> It's work explicitly talking about identifier scope in the syntax section on
> blank nodes.  I think I can do that under the banner of "editorial".
> 
> >However, as I have understood conversations in email and
> >telecon, the definition of basic graph pattern E-matching in
> >2.5.1 "General framework" provides the only definition for the
> >semantics of blank node identifiers, and therefore
> >the scope of a blank node identifier
> >is a basic graph pattern.  My question is whether the scope
> >can also extend into a Constraint in a FilteredBasicGraphPattern.
> >
> >For example, consider the data set with these three triples:
> >
> ><s1> <v> <o1> .
> ><s2> <v> <o2a> .
> ><s2> <v> <o2b> .
> >
> >The user wants to find those subjects which are related via the
> >verb <v> to at least two objects.  The desired solution
> >sequence is { <s2> }.  The user writes his query this way:
> >
> >SELECT ?x
> >WHERE { ?x <v> _:a . ?x <v> _:b . FILTER (_:a != _:b) }
> 
> <o2a> and <o2b> may be names for the same object in the domain of discourse.
> In general, it isn't possible to conclude anything about numbers of things 
> in
> RDF.  It is in OWL.
> 
> We have blank nodes in queries for two reasons:
> 
> 1/ Syntax - because of the [] forms which are blank nodes in N3/Turtle.
> 
> 2/ Handling OWL-disjunction (as the prototypical case).
> 
>  From (1), we need a treatment for blank nodes.  Some members of the working
> group were interested in leaving (2) open and made a proposal.
> 
> (We could have deviated from N3/Turtle and make [] be anonymous named
> variables (if you'll forgive the slight contradiction in terminology -
> variables with a name but hidden from the user)
> 
> Named variables in solutions are bound to some RDF term if needed : blank
> nodes are handled by entailment so (OWL disjunction) there are cases where
> they are known to have a value, but not what that value is.  (Bijan's
> undistinguished variables.)
> 
> This does not occur for RDFS entailments (or any entailment with a logical
> closure so any rule-based entailment regime).

Indeed, and I read the charter section 1
http://www.w3.org/2003/12/swa/dawg-charter#scope
[[
The RDF data model is a directed, labeled graph with edges labeled
with URIs and nodes that are either unidentified, literals, or URIs
(please see the RDF Primer for further explanation). The principal
task of the RDF Data Access Working Group is to gather requirements
and to define an HTTP and/or SOAP-based protocol for selecting
instances of subgraphs from an RDF graph.
]]

and section 2.1
http://www.w3.org/2003/12/swa/dawg-charter#rdfs-owl-queries 
[[
The protocol will allow access to a notional RDF graph. This may in
practice be the virtual graph which would follow from some form of
inference from a stored graph. This does not affect the data access
protocol, but may affect the description of the data access
service. For example, if OWL DL semantics are supported by a service,
that may be evident in the description of the service or the virtual
graph which is queried, but it will not affect the protocol designed
under this charter.
]]

as saying we query those entailments that are expressed in an RDF
graph.  If we tweak the semantics so that _:foo can correspond to a
value that is implied by either side of an OWL disjunction, and it
cannot be treated as an ordinary (albiet unretournable) variable in
the graph, we are, I believe, making much SPARQL harder to define and
to understand. My issue is not that the bnode lables may or may not be
usable in the FILTER, but more that they can't be used elsewhere in
the graph pattern.

*OWL* Data:
@prefix wine: <http://www.w3.org/2001/sw/WebOnt/guide-src/wine#> .
@prefix rdf-schema: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix food: <http://www.w3.org/2001/sw/WebOnt/guide-src/food#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

@prefix r: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
food:MyLunch r:type food:PastaWithLightCreamCourse .
food:MyDinner r:type food:RedMeatCourse .

food:MealCourse rdf-schema:subClassOf [
   owl:onProperty food:hasDrink ;
   owl:minCardinality "1"^^xsd:nonNegativeInteger
] .

food:PastaWithLightCreamCourse owl:intersectionOf (
   food:MealCourse
   [
	owl:allValuesFrom food:PastaWithLightCreamSauce ;
        owl:onProperty food:hasFood
   ] ) ;
                               rdf-schema:subClassOf [
   owl:onProperty food:hasDrink ;
   owl:allValuesFrom [
      owl:onProperty wine:hasColor ;
      owl:hasValue food:White
   ]
] .

food:RedMeatCourse owl:intersectionOf (
   food:MealCourse
   [
         owl:onProperty food:hasFood ;
         owl:allValuesFrom food:RedMeat
   ] ) ;
                   rdf-schema:subClassOf [
   owl:onProperty food:hasDrink ;
   owl:allValuesFrom [
      owl:onProperty wine:hasColor ;
      owl:hasValue food:Red
   ]
] .

Note that this graph does not directly create any triples with
food:hasDrink or wine:hasColor in the predicate. MyDinner must have a
drink because it is a RedMeatCourse, which is a subClassOf something
which, if it has a Drink, must have a Red drink. Also, it's a
MealCourse, which must have at least one Drink.

So there we go. It must hasDrink of something with the color Red.

Query:
PREFIX food: <http://www.w3.org/2001/sw/WebOnt/guide-src/food#>
PREFIX wine: <http://www.w3.org/2001/sw/WebOnt/guide-src/wine#>
SELECT ?Meal ?WineColor
WHERE {
   ?Meal food:hasDrink _:Wine .
   _:Wine wine:hasColor ?WineColor }

Pellet <http://www.mindswap.org/2003/pellet/demo> gives these
Results:
|  Meal 	| WineColor |
+---------------+-----------+
| test:MyLunch 	| :White    |
| test:MyDinner	| :Red	    |


When querying Pellet for:,
   ?Meal food:hasDrink ?Wine .
   ?Wine wine:hasColor ?WineColor

it gives no results because it has no bindings for the Wine. But why
not? Certainly, it as deduced that there is something there, but it's
opaque to RDF. Why doesn't it infer the triples:?
  food:MyLunch hasDrink [ hasColor :White ] .
  food:MyDinner hasDrink [ hasColor :Red ] .

That seems quite in order to me, freeing us to use bNodes as
existential that affect future counting semantics, or remove them,
treat them exactly as regular variables except you can't select them
and they don't show up in SELECT *.

> This is important because solutions (bindings of named variables and RDF
> terms) flow through the graph operators.  Entailment only happens within a
> basic graph pattern.  Only conjunctive triples patterns have any meaning for
> entailment.
> 
> I find the alternative of relying on the presence or absence of a named
> variable in the SELECT clause a very confusing  way of going about it - one
> part of the syntax indirectly affects another part of the query.  It also 
> does
> not extend to queries with more than one BGP in them.
> 
> >
> >Does this do what the user wants?
> >
> >It seems that the definitions in 2.5 "Basic graph patterns"
> >only explain how to solve the basic graph pattern
> >
> >?x <v> _:a . ?x <v> _:b .
> >
> >The solutions of this basic graph pattern are ?x = <s1>
> >and ?x = <s2>.  In the case of ?x = <s1>, this is because
> >the dataset entails the addition of these triples:
> >
> ><s1> <v> _:a .
> ><s1> <v> _:b .
> >
> >or in predicate calculus terms, it is possible to conclude
> >from the dataset that
> >
> >(exists _:a, _:b) [ <s1> <v> _:a . <s1> <v> _b . ]
> >
> >Or using the mapping technique for simple entailment, map
> >?x -> <s1>, _:a -> <o1>, _:b -> <o1> and then restrict to
> >just the mapping of ?x.
> >
> >Note that the definitions of section 2.5, using either
> >entailment or mapping, do not provide for evaluating a
> >Constraint during the process of finding solutions to a
> >basic graph pattern.
> >
> >So both solutions ?x -> <s1> and ?x -> <s2> come to the
> >FILTER clause, and the FILTER clause is unaware of any bindings
> >to _:a or _:b.  I do not know whether the result of
> >FILTER (_:a != _:b) is true, false or error, but whatever
> >the semantics of the FILTER clause is, it appears that it will
> >treat the two solutions identically.  If true, then both
> ><s1> and <s2> are solutions; if false or error, then neither
> >are.  Thus the solution set appears to be either { <s1>, <s2> }
> >or the empty set.  Not what was desired!
> >
> >I see four possible resolutions:
> >
> >1. (My preference) the scope of a blank node identifier is
> >an entire FilteredBasicGraphPattern, not just a basic graph
> >pattern.  To do this, we need to extend the definitions in
> >section 2.5 so that they define the solutions of a
> >FilteredBasicGraphPattern rather than just the solutions of a
> >basic graph pattern.  I can see how to do this with the
> >simple entailment mapping definition; I don't see how to do
> >this with the general E-entailment definition.
> 
> My preference as well.
> 
> I would remove the possibility of blank nodes (and general expressions) in 
> the
> functions isIRI/isLiteral/isBlank, restricting them to named variables only,
> because these really work on the terms of the bindings, not the values.
> 
> I would like to see a proposal for (1) from one or more of the original
> contributors of the current text (Enrico, Bijan, Pat).
> 
> >
> >2. We prohibit blank node identifiers in FILTER clauses as
> >inherently meaningless or deceptive syntax. 
> 
> OK - but less of a preference.  For me, this is a fall-back from (1) that we
> can choose if we do not manage to get agreement around (1).
> 
> >3. We allow blank node identifiers in FILTER clauses, but they
> >always raise an error, so that such FILTERs always fail.
> >But in that case, why did we permit the syntax?
> >
> >4. We allow blank node identifiers in FILTER clauses, and
> >they reference distinct blank nodes, distinct from all blank
> >nodes in the dataset.  Thus _:a = _:b is false, and _:a != _:b
> >is true.
> 
> These two seem more confusing than (2).
> 
> >
> >Fred
> >
> 
> 	Andy
> 
> 
> 
> 
> 

-- 
-eric

home-office: +1.617.395.1213 (usually 900-2300 CET)
	    +33.1.45.35.62.14
cell:       +33.6.73.84.87.26

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Thursday, 13 July 2006 16:22:44 UTC