Re: RDF Data Access Working Group : first working draft of SPARQL

Andy, various responses here.

I've cc'ed www-archive so you can reference this message publicly.

At 20:19 17/10/04 +0100, Seaborne, Andy wrote:
>Graham,
>
>Regarding:
>
>>7. Section 4
>>I note you've chosen to allow optional elements of graph patterns, but 
>>not alternatives.  In one of my implementations I provided alternative 
>>blocks, where the last alternative could be empty, hence also providing 
>>optional patterns.  Alternatives are permitted to bind the same variable, 
>>thus providing ways to match different (graph-syntactical) expressions of 
>>the same information.  I have sometimes found this to be useful, but it 
>>does somewhat mess up the clean semantics of the approach you have adopted.
>>Despite the semantic messiness, I do feel that having some capability to 
>>select one possible match over another, when dealing with possibly messy 
>>real-world data, could be useful enough to justify the consequent 
>>complication of query optimization when such a feature is used.
>
>Could you provide some examples of your approach?  I think I understand 
>what you are suggesting but some concrete examples would confirm, or 
>modify, my understanding.  It's "ordered disjunction" isn't it, like || ? 
>Execute blocks until one matches.

Yes, it is.

Here's an example from something I did recently, in my own private query 
syntax:

[[
@query hrep:HdrPersonPattern
     ( ?person
       ( foaf:name ?aname
       & ( foaf:mbox     ?ambox
         | foaf:homepage ?ahomepage
         )
       & [ foaf:organization ?orgname ]
       & [ @hrep:HdrPostalPattern ]
       & [ foaf:workplaceTel ?wtel ]
       & [ foaf:workplaceFax ?wfax ]
       & [ foaf:workplaceHomepage ?wurl ]
       )
     )
]]

I hope the intent here is reasonable clear.  (I express queries as a tree, 
which is subsequently processed into triples -- the '&'s here indicate a 
conjunction of patterns that are arcs originating from the ?person 
node.  The '|' indicates a disjunction.  [foo] is treated as (foo|), which, 
BTW in my "ordered disjunction" scheme means that an available match is 
always favoured over an omission.

The sub-pattern:
[[
       & ( foaf:mbox     ?ambox
         | foaf:homepage ?ahomepage
         )
]]

Preferentially matches a foaf:mbox, and failing that matches a 
foaf:homepage property.   I *could* choose here to use the same variable 
(e.g. '?networkident') which would be bound preferentially to the value of 
one of the properties.

Another example, which comes about by virtue of a design transition in my 
data, is:
[[
@query hrep:HdrControllerPattern
     ( ?header
       ( hdr:controller ?controller
       | hdr:author     ?controller
       )
     )
]]
(I actually do this by a slightly mechanism, for historical reasons, but I 
hope the intent here is clear.)

I made the review comment as an observation rather than an exhortation:  if 
you feel that the cleanliness of the present approach is best, I'm not 
planning to argue.  I think that my suggestion can always be achieved by 
making multiple queries, but at the expense of breaking up a logically 
single query.

...

I just realized, responding to the above,  that something my query engine 
allows is predefined variable bindings.  I can include variable bindings 
from a previous query when making a new query:  in this way, the new query 
becomes an extension, or refinement, of the previous.  This process can be 
cascaded through arbitrary many queries.  This is a feature I do use very 
extensively in my (few) applications that use graph query.

Using this facility, I can easily achieve the effects described above using 
multiple queries.

I note that the same effect could be achieved by having a local 
query-rewriting engine, so maybe it really doesn't belong an an on-the-rire 
query language.

> > but it does
> > somewhat mess up the clean semantics of the approach you have adopted.
>
>There are some more theoretically minded members of the semantic web 
>community than me :-)

I fully support aiming for a theoretically clean design.  I think that 
yields long-term benefits in terms of predictability and reliability, 
provided that it's sufficient to do the job that needs doing.

...

>- - - - - -
>I did a message on the WG list about your message to start the ball rolling:
>http://lists.w3.org/Archives/Public/public-rdf-dawg/2004OctDec/0129.html

Some comments on that.

>It should be:
>a) binding (a single name/value pair)
>b) set of bindings - pattern solution when the set of bindings gives the
>way a pattern matches
>c) query results, where the bindings are saying how a pattern was matched

I think the phrase "set of bindings" for (b), while technically correct, is 
open to casual misinterpretation.  How about a "binding-tuple" or 
"binding-row" or some suchlike that clearly does not indicate case (c).

Or maybe:
(b) query result
(c) query result set
... but maybe they're too similar?>

...

> > 5. I note that variables are allowed in predicate position.  If this
> > doesn't present any problems, I'm all in favout of this, but I think the
> > design decision could be highlighted more clearly.
>
>It hadn't occurred to me that it might not be possible.  I'm not aware of
>any issues arising.  That feature is available in several existing query
>languages.

Then I'm glad I mentioned it ...  I just think the decision should be made 
explicitly, not accidentally :-)

...

> > Section 12.
> >
> > Testing values.  Is there a way to combine tests with non-struct
> > evaluation, so that something like:
> >
> >     AND isBound ?x AND ?x < 20
> >
> > can be reliably processed?
>
>This is covered in newer drafts in sec 12.  If ?x is unbound, isBound is
>false so the result is false (as in evaluation of ?x < 20).  Evaluation
>involving unbound variables is false unless otherwise noted (e.g. unbound())
>
>That's a point about whether AND and && are *exactly* the same.

I meant to say "non-strict", and I wasn't really concerned with syntactic 
details:  I don't know what you mean by "AND" vs "&&" (though I can guess).

What I meant to suggest was that within a constraint, there should a 
well-defined evaluation sequence that (a) avoids evaluation of un-needed 
terms, and (b) that it is entirely clear what tests will be performed under 
what circumstances.  Like short-circuit evaluation of Java conditional 
operators.

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Received on Monday, 18 October 2004 09:13:11 UTC