Re: Unbound vars as blank nodes from Seaborne, Andy on 2005-03-23 (public-rdf-dawg-comments@w3.org from March 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 23 Mar 2005 14:29:54 +0000
To: Geoff Chappell <geoff@sover.net>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <42417D62.1090202@hp.com>
Geoff Chappell wrote:
> 
> 
>>-----Original Message-----
>>From: Arjohn Kampman [mailto:arjohn.kampman@aduna.biz]
>>Sent: Wednesday, March 23, 2005 5:52 AM
>>To: Geoff Chappell
>>Cc: public-rdf-dawg-comments@w3.org
>>Subject: Re: Unbound vars as blank nodes
>>
> 
> [...]
> 
>>>SELECT ?x ?y
>>>   WHERE  { ?book dc10:title ?x }
>>>
>>>Logically speaking projection vars are existentially quantified, right?
>>
>>And
>>
>>>that's what a blank node is, so it seems logically correct to return:
>>>
>>>	?x="Moby Dick", ?y=_:b1.
>>>
>>>I.e. the sentence:
>>>	There exists ?x,?y such that ?x is the title of something
>>>essentially becomes:
>>>	There exists ?y such that "Moby Dick" is the title of BK1
>>
>>Yikes! Apart from the fact that the above query should be flagged as
>>illegal (see my previous posting to this list), generating new bnodes
>>for unbound variables will make the QL even more complex than it already
>>is. Developers have learned to live with NULL values in the context of
>>SQL, so why would this be problematic for SPARQL?
> 
> 
> I'm not sure I buy the complexity argument... do you mean complex for the
> implementor or complex for the user? Either way, it doesn't strike me as too
> much of a burden. And I think the SQL/SPARQL analogies only get you so far.
> E.g. wrt to this issue, RDF has a built-in way to represent variables in
> results, SQL doesn't. Plus, NULLs carry their own load of controversy and
> confusion in the SQL world.
> 
> That's not to say NULLs won't work. I think a perfectly workable solution is
> to require that all vars mentioned in a pattern element are bound to
> something by that pattern element -- if not to an actual value, then to NULL
> -- and that NULL != NULL. IMHO that would resolve the current execution
> ordering mess (I've heard statements to the contrary but I've never seen a
> counter example). 

Geoff,

I extracted this from our previous discussion: could you check I've got the 
example right?

Data::
@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name       "Alice" .
_:b  foaf:name       "Bob" .
_:b  foaf:mbox       <mailto:bob@work.example> .
_:c  foaf:mbox       <mailto:noname@work.example.org> .

Query::
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
   { ?x foaf:name  ?name .
     OPTIONAL { ?x foaf:mbox ?mbox } .
     ?y foaf:mbox ?mbox .
   }

The first two lines of the pattern on their own give:

---------------------------------------
| name    | mbox                      |
=======================================
| "Alice" |                           | <<---- ????
| "Bob"   | <mailto:bob@work.example> |
---------------------------------------

If ?mbox in the first solution is NULL (whether NULL = NULL or NULL != NULL) 
then the third pattern

     { ?y foaf:mbox ?mbox }

does not match either of the triples

   _:c  foaf:mbox   <mailto:noname@work.example.org>
   _:b  foaf:mbox   <mailto:bob@work.example>

because ?mbox is bound to NULL (unless matching for NULL is special - I'm 
treating just as an assignable value which might be my mistake). I find this 
strange because the first and third triple patterns are independent.

The other partial solution matches
   _:b  foaf:mbox   <mailto:bob@work.example>

so there is one solution

---------------------------------------
| name    | mbox                      |
=======================================
| "Bob"   | <mailto:bob@work.example> |
---------------------------------------

If ?mbox is unbound then
   ?name = "Alice"  ?mbox = <mailto:noname@work.example.org>
and
   ?name = "Alice"  ?mbox = <mailto:bob@work.example.org>

are solutions.

----------------------------------------------
| name    | mbox                             |
==============================================
| "Alice" | <mailto:bob@work.example>        |
| "Alice" | <mailto:noname@work.example.org> |
| "Bob"   | <mailto:bob@work.example>        |
----------------------------------------------


Reversing the lines:

   ?x foaf:name  ?name .
   ?y foaf:mbox ?mbox .
   OPTIONAL { ?x foaf:mbox ?mbox } .

after the first two (independent : cross product)

----------------------------------------------
| name    | mbox                             |
==============================================
| "Bob"   | <mailto:noname@work.example.org> |
| "Bob"   | <mailto:bob@work.example>        |
| "Alice" | <mailto:noname@work.example.org> |
| "Alice" | <mailto:bob@work.example>        |
----------------------------------------------

and the OPTIONAL does nothing (for either NULL or unbound models).

What we have to do is find a way of saying is "do inner joins first - don't have 
two variables in optionals without being in fixed pattern".  This can be via 
syntactic restriction (which may remove some OK queries as well) (and it needs a 
non-synatctic rule about variable usage acorss optional c.f. outer joins) or a 
general restriction on the query structure.


A nearby issue arises for:

     ?v < 3 .
     ?x :p ?v .

SQL has a clear syntactic distinction but it forces more separation than 
necessary and does not address:

     ?v math:lessThan 3 .
     ?x :p ?v .

(I'm ignoring the subjects-as-literals issue).

It is obvious what the application intended but a system can't naively ignore order.


My other worry about a syntactic restriction is more about large queries. 
Forcing a gap means that the application writer has to associate one part of the 
query with another - like not putting the condition on a variable next to the 
variable.

	Andy

ref:
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Feb/0012.html

> The current approach to specifying preferred execution
> orders is just too fragile. It will be a major obstruction to future
> versions of the language - e.g. good luck using sparql (squint and construct
> looks like a rule construction) as any sort of a rule language with all of
> these ordering dependencies built-in.
> 
> 
>>[...]
>>
>>>Now for optionals....
>>>
>>>SELECT ?x ?y
>>>   WHERE  { ?book dc10:title ?x. OPTIONAL  ?book ex:author ?y.}
>>>
>>>The result:
>>>
>>>	?x="Moby Dick"  	?y=_:b1
>>>
>>>seems reasonable - i.e. we know the book has an author (that's what
>>
>>we've
>>
>>>implied by using optional) we just don't know what it is.
>>
>>This is not true: the optional implies that the book can have an author,
>>not that it actually has one. From a developer POV, it's important to
>>make this distinction. Returning bnodes for unbound variables suggests
>>that it actually was bound.
> 
> 
> Well, I guess I'd say that optional implies whatever optional is specified
> to imply. But I'll agree it's a weakness of this approach that a user
> couldn't necessarily distinguish between a "real" and an "artificial" value.
> 
> 
> - Geoff
> 
> 
> 
> 
>
Received on Wednesday, 23 March 2005 14:57:24 UTC