W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > February 2005

Re: Questions about OPTIONAL ... and declarative syntax

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Wed, 23 Feb 2005 09:11:09 -0800
Message-ID: <421CB92D.8000707@siderean.com>
To: andy.seaborne@hp.com
CC: Geoff Chappell <geoff@sover.net>, public-rdf-dawg-comments@w3.org
Andy,

I've been taking it for granted that SPARQL is intended to be
a declarative language.  So I am surprised when I see a phrase such
as

 >      
The current working draft is inadequate in the treatment of order of execution 

 >      implications - this is something that has to be done.

However, I can interpret the meaning of this in more than one way.   I'm 
hoping
it means the following:  "SPARQL is declarative, but the means for eliciting
a declarative semantics wrt OPTIONAL have not yet been fully worked out".

Among other things, being declarative implies that the conjunction operator
is syntactically commutative (unlike Prolog, where order of conjuncts 
matters).
Consider the following (pc-like) expression, where
AND means "logical and", not the SPARQL "AND":

     ?b  > ?c AND
     (?a  p1 ?b) AND
     (?a p2 ?c)
   
If I evaluate the first clause first, neither of the variables are 
bound, so the
whole query fails.  However, that doesn't mean that I the user should 
write it differently --
it means that the interpreter has to be smart about order of 
evaluation.  Basically,
any order that returns the maximum number of bindings is a correct order of
evaluation.  If there are multiple maximal orderings, but no maximum 
ordering,
then your language is not declarative.  So, for example, when Geoff asks 
below
which ordering is correct, the answer must be "the one that yields the 
maximal
number of bindings, and that includes solutions produced by all other 
orderings."

The SPARQL 'AND' operator appears to be a hack that forces computed
predicates to be evaluated later, so as to maximize the number of solutions.
However, placing this kind of ordering restriction on the syntax flies 
in the
face of orthogonality.  As soon as you add disjunction, you find you need
(if your language is worth anything at all) to allow triple patterns and
constraints to be interleaved, and then whole mess starts to unravel.

The SQL language has worked out all of the semantics for allowing
conjunction and disjunction and constraints (and negation-as-failure) to 
work
harmoniously together.  Instead of reinventing the wheel, I would 
strongly recommend that
the committee review the SQL semantics, and maybe bring in a database 
person.

The one thing (that I am aware of) for which SQL didn't achieve a 
declarative solultion
is the outer join.  Early outer joins (e.g., the one implemented by 
ORACLE) were
completely ambiguous, yielding different solutions depending on the 
ordering.
The SQL2 committee failed to find a fully declarative solution, and instead
produced a solution wherein the pattern of nested outer joins defines
the semantics.  Technically, the nested expressions ARE declarative,
but since they restrict the ordering of evaluation severely, from a 
practical
standpoint, the outer joins are procedural.  I think you can take it for
granted that if there WERE a truly declarative solution, the SQL2 committee
would have found it.

So, why should we care about outer joins?  Because OPTIONAL is
the SPARQL equivalent of an outer join.  Outer joins are useful in SQL,
but a large percentage of SQL queries use only inner joins, so can we
just ignore OPTIONAL for the time being?  Unfortunately no.  The reason
is that a column in a SQL table that admits null values is semantically
analoguous to an outer join, without actually being one, and null column
values are very common.  In SPARQL, OPTIONAL is needed to handle
the equivalent of a null column value.  In other words, the need for 
OPTIONAL
in SPARQL is much greater than the need for outer joins in SQL.

Right now, SPARQL appears to be resorting to nested expressions not only
to handle OPTIONAL, but also to interleave contraints and triples.  This
is just buying trouble.  The correct thing to do is to mimic SQL -- for the
non-OPTIONAL case, produce a syntax and semantics where conjunction and 
disjunction
are syntactically commutative, and there is only one kind of 
conjunction, not
two.  Then try to fit OPTIONAL in, using nesting (and scope rules for 
variables)
to achieve a unique semantic interpretation.  And then MAYBE the UNBOUND
operator can be safely introduced (or maybe not).

Cheers, Bob

Seaborne, Andy wrote:

>
> Geoff Chappell wrote:
>
>> A few questions about OPTIONAL...
>>
>> 1) If I had a query like this:
>>
>> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>> SELECT ?x ?name ?y
>> WHERE  ( ?x foaf:name  ?name )
>>        OPTIONAL ( ?x foaf:mbox ?mbox )
>>        ( ?y foaf:mbox ?mbox )
>>
>> with data:
>>
>> @prefix foaf:       <http://xmlns.com/foaf/0.1/> .
>>
>> _:a  foaf:name       "Alice" .
>> _:b  foaf:name       "Bob" .
>> _:b  foaf:mbox       <mailto:bob@work.example> .
>> _:c  foaf:mbox       <mailto:noname@work.example.org> .
>>
>>
>> should I get:
>>
>> x   name    y
>> === ======= ===
>> _:b "Bob"   _:b
>>
>> or:
>>
>> x   name    y
>> === ======= ===
>> _:a "Alice" _:b   _:a "Alice" _:c   _:b "Bob"   _:b 
>
>
> Geoof - thank you very much for including a concrete example:
>
> The current working draft is inadequate in the treatment of order of 
> execution
> implications - this is something that has to be done.
>
>
> Executed purely in the order given, I tried your example and get:
>
> -------------------------
> | x    | name    | y    |
> =========================
> | _:b0 | "Alice" | _:b1 |
> | _:b0 | "Alice" | _:b2 |
> | _:b2 | "Bob"   | _:b2 |
> -------------------------
>
> but as:
>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> # Add ?mbox to select for clarity
> SELECT ?x ?name ?y ?mbox
> WHERE  ( ?x foaf:name  ?name )
> # Reverse these next two lines.
>         ( ?y foaf:mbox ?mbox )
>         OPTIONAL ( ?x foaf:mbox ?mbox )
>
>
> I get:
>
> ------------------------------------------------------------
> | x    | name    | y    | mbox                             |
> ============================================================
> | _:b0 | "Bob"   | _:b0 | <mailto:bob@work.example>        |
> | _:b0 | "Bob"   | _:b1 | <mailto:noname@work.example.org> |
> | _:b2 | "Alice" | _:b0 | <mailto:bob@work.example>        |
> | _:b2 | "Alice" | _:b1 | <mailto:noname@work.example.org> |
> ------------------------------------------------------------
>
> because the first two triple patterns are unconnected so the number of 
> results
> is 2 * 2 (each matches twice), and the optional adds nothing because 
> ?x and
> ?mbox were already defined by earlier patterns.
>
>
>>
>> In other words, is a variable bound to a value such as NULL by an 
>> optional
>> when there it is not otherwise bound by the block's conditions or 
>> does it
>> truly escape unbound? If it's bound to a NULL, does NULL==NULL?
>>
>> 2) In section 5.3 you say: "If a new variable is mentioned in an 
>> optional
>> block (as mbox and hpage are mentioned in the previous example), that
>> variable can be mentioned in that block and can not be mentioned in a
>> subsequent block."
>>     a. Does a subsequent _block_ refer to a subsequent _OPTIONAL block_
>> or does block refer to any logical factor (e.g. the triple after the
>> optional in my previous example)? If the latter, I think you need to 
>> make
>> that clearer (i.e. define block somewhere).
>>
>>     b. Does this mean subsequent relative to the order the query was
>> written or relative to a possible re-ordering by a query processor? 
>> Similar
>> question for section 5.6 - is the shape of the tree determined by the 
>> order
>> as written or the processing order?
>>
>> 3) I'll note that in my opinion this would all be clearer if you 
>> spelled out
>> that A and OPTIONAL B is just shorthand for A and (B or not B) -- or 
>> for A
>> and (B or (not B and v1..n=NULL)). That would lay a firmer foundation 
>> for
>> considering questions such as: "Is OPTIONAL B and A the same as A and
>> OPTIONAL B?" and generally present the concept in a language people 
>> are more
>> familiar with.
>
>
> Having OPTIONAL always pass the "no match" case even when there are other
> matched results in a stable execution order but it results in extra 
> solutions
> that add nothing except complications for applications.
>
> For example:
>
> @prefix foaf:       <http://xmlns.com/foaf/0.1/> .
>
> _:a  foaf:name       "Alice" .
> _:a  foaf:mbox       <mailto:alice@work.example> .
>
> Query:
>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> SELECT ?name ?mbox
> WHERE
>      ( ?x foaf:mbox ?mbox )
>      OPTIONAL ( ?x foaf:name  ?name )
>
>
> gives:
>
> -----------------------------------------
> | name    | mbox                        |
> =========================================
> | "Alice" | <mailto:alice@work.example> |
> -----------------------------------------
>
>
> or:
>
> -----------------------------------------
> | name    | mbox                        |
> =========================================
> |         | <mailto:alice@work.example> |
> | "Alice" | <mailto:alice@work.example> |
> -----------------------------------------
>
> Given the idea behind optional of "add extra information to a solution if
> available, but don;t fail if not there"
>
> In simple cases, it may be possible to filter the output to remove this
> redundancy but in more complicated queries (for example, ?name is used 
> elsewhere
> as well, multiple optionals, sharing variables), I didn't manage to 
> find a way
> that kept the streaming requirement for results.
>
> If the application is displaying information for people, then getting two
> answers back for what is one person is a less useful paradigm.  It is 
> a tradeoff
> of convenience.
>
> I did find that (same results as the last example):
>
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> SELECT ?name ?mbox
> WHERE
>      ( ?x foaf:mbox ?mbox )
>      { {} UNION ( ?x foaf:name  ?name ) }
>
> is illegal in the current grammar (a form of "A and (B or not B)" for
> optionals). Maybe it shouldn't be.  Thoughts?
>
>>
>> -Geoff
>
>
>     Thanks for the example and commentary,
>     Andy
>
>

-- 

Bob MacGregor
Chief Scientist

	
	Siderean Software Inc
390 North Sepulveda Blvd., Suite 2070
<http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> 
El Segundo, CA 90245
bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> 	
tel: 	+1-310 647-4266
fax: 	+1-310-647-3470

 

 

 

 
Received on Wednesday, 23 February 2005 17:12:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:47 GMT