- From: Jeen Broekstra <jeen@aduna.biz>
- Date: Wed, 26 Oct 2005 15:04:30 +0200
- To: Dan Connolly <connolly@w3.org>
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Dan Connolly wrote: > Reviewing last call comment status, this one is (a) not connected > to an open issue, (b) not just editorial, and (c) hasn't gotten > much airtime. > > This request seems pretty reasonable: > > [[ There are at least two ways to trim the results back down with > just syntax changes. The least intrusive change would be to just > drop the <unbound> tag, and have it be implicit with <binding > name=".."/>. More drastic is to just drop the entire <binding> tag > when the variable is unbound, since the information can be > retrieved from the head. ]] -- SPARQL Results Format and Unbound > Variables http://www.w3.org/mid/42F4CEEB.5090306@umd.edu aka > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0043 > > > > > > So how about we make <binding> elements for unbound variables > optional? Like Steve and Andy said, having <binding> optional (in the sense of it may or may not be there) sounds like a bad idea, but specifying that there is never a binding element if the variable is unbound is possible I guess. There are some pretty big drawbacks to consider though, in terms of cost of processing. In the current spec, a result processor can simply assume that every row has the number of bindings specified in the header, in the order specified in the header. Arguably the link between bindings and variables (that is, the name attribute) is even redundant in the current spec. XML result can be processed very fast because no explicit matching between column names and binding elements needs to be done at all. Allowing 'skipping' of binding elements for unbound variables changes this: suddenly a processor needs to do a string compare between the column header and the name specified in the binding. And it needs to do this for _every_ binding. Also, of course, as Steve already mentioned, it makes writing XSLT forms for query results quite a bit harder. The major argument in favor of the change is the size of the serialized result set in cases like queries with UNION, or with lots of optionals. However, IMHO minimizing the size of the serialization has never been a major design goal of the XML result format, nor should it be. To be blunt: if you want to minimize the number of bytes on the line, use compression, or better yet, dump XML and use a binary format. Of course that does not mean that we should never care about the verbosity of the XML result format, but I think that in this case there are significant disadvantages to allowing this, against a advantage of which I am uncertain there are not other, better ways of solving it. In the request, another option was mentioned: not dropping the <binding> element, but dropping <unbound> (and hence having an empty <binding> element). Although slightly more regular this is still more expensive to process than the current LC format. As an example of this: the current Sesame SPARQL XML result parser completely skips binding elements and just jumps directly to the uri, literal, bnode or unbound element. In the proposed format, this will no longer be possible and instead it will have to do a check for each binding element to see if it contains a subelement. Not saying that that is fiendishly difficult to do of course, but it does make processing, or writing XSLT, more complex. Long story short: I have a preference for keeping the spec the way it is now. By the way, if size of the serialization does become a major design goal, there are other, more obvious changes to make to the format: the binding element could be dropped altogether, for example. I'm not advocating this, I think regularity and ease of processing are more important features than number of bytes. Jeen -- Jeen Broekstra Aduna BV Knowledge Engineer Julianaplein 14b, 3817 CS Amersfoort http://aduna.biz The Netherlands tel. +31 33 46599877
Received on Wednesday, 26 October 2005 13:06:08 UTC