Re: allow implicitly unbound variables in SPARQL results?

Dan Connolly wrote:
> Reviewing last call comment status, this one is (a) not connected 
> to an open issue, (b) not just editorial, and (c) hasn't gotten 
> much airtime.
> 
> This request seems pretty reasonable:
> 
> [[ There are at least two ways to trim the results back down with 
> just syntax changes.  The least intrusive change would be to just 
> drop the <unbound> tag, and have it be implicit with <binding 
> name=".."/>.  More drastic is to just drop the entire <binding> tag
>  when the variable is unbound, since the information can be 
> retrieved from the head. ]] -- SPARQL Results Format and Unbound 
> Variables http://www.w3.org/mid/42F4CEEB.5090306@umd.edu aka 
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0043
> 
> 
> 
> 
> 
> So how about we make <binding> elements for unbound variables 
> optional?

Like Steve and Andy said, having <binding> optional (in the sense of
it may or may not be there) sounds like a bad idea, but specifying
that there is never a binding element if the variable is unbound is
possible I guess.

There are some pretty big drawbacks to consider though, in terms of
cost of processing.

In the current spec, a result processor can simply assume that every
row has the number of bindings specified in the header, in the order
specified in the header. Arguably the link between bindings and
variables (that is, the name attribute) is even redundant in the
current spec. XML result can be processed very fast because no
explicit matching between column names and binding elements needs to
be done at all.

Allowing 'skipping' of binding elements for unbound variables changes
this: suddenly a processor needs to do a string compare between the
column header and the name specified in the binding. And it needs to
do this for _every_ binding.

Also, of course, as Steve already mentioned, it makes writing XSLT
forms for query results quite a bit harder.

The major argument in favor of the change is the size of the
serialized result set in cases like queries with UNION, or with lots
of optionals. However, IMHO minimizing the size of the serialization
has never been a major design goal of the XML result format, nor
should it be. To be blunt: if you want to minimize the number of bytes
on the line, use compression, or better yet, dump XML and use a binary
format.

Of course that does not mean that we should never care about the
verbosity of the XML result format, but I think that in this case
there are significant disadvantages to allowing this, against a
advantage of which I am uncertain there are not other, better ways of
solving it.

In the request, another option was mentioned: not dropping the
<binding> element, but dropping <unbound> (and hence having an empty
<binding> element). Although slightly more regular this is still more
expensive to process than the current LC format. As an example of
this: the current Sesame SPARQL XML result parser completely skips
binding elements and just jumps directly to the uri, literal, bnode or
unbound element. In the proposed format, this will no longer be
possible and instead it will have to do a check for each binding
element to see if it contains a subelement.

Not saying that that is fiendishly difficult to do of course, but it
does make processing, or writing XSLT, more complex.

Long story short: I have a preference for keeping the spec the way it
is now.

By the way, if size of the serialization does become a major design
goal, there are other, more obvious changes to make to the format: the
binding element could be dropped altogether, for example. I'm not
advocating this, I think regularity and ease of processing are more
important features than number of bytes.

Jeen
-- 
Jeen Broekstra          Aduna BV
Knowledge Engineer      Julianaplein 14b, 3817 CS Amersfoort
http://aduna.biz        The Netherlands
tel. +31 33 46599877

Received on Wednesday, 26 October 2005 13:06:08 UTC