Re: allow implicitly unbound variables in SPARQL results? from Jeen Broekstra on 2005-10-26 (public-rdf-dawg@w3.org from October to December 2005)

From: Jeen Broekstra <jeen@aduna.biz>
Date: Wed, 26 Oct 2005 16:54:47 +0200
To: Dan Connolly <connolly@w3.org>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <435F98B7.6030208@aduna.biz>
Dan Connolly wrote:
> On Wed, 2005-10-26 at 15:04 +0200, Jeen Broekstra wrote:
> 
>>Dan Connolly wrote: 
>>
>>>This request seems pretty reasonable:
>>>
>>>[[ There are at least two ways to trim the results back down with 
>>>just syntax changes.  The least intrusive change would be to just 
>>>drop the <unbound> tag, and have it be implicit with <binding 
>>>name=".."/>.  More drastic is to just drop the entire <binding> tag
>>> when the variable is unbound, since the information can be 
>>>retrieved from the head. ]] -- SPARQL Results Format and Unbound 
>>>Variables http://www.w3.org/mid/42F4CEEB.5090306@umd.edu aka 
>>>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0043
> 
> 
> [...]
> 
> 
>>Also, of course, as Steve already mentioned, it makes writing XSLT
>>forms for query results quite a bit harder.
>>
>>The major argument in favor of the change is the size of the
>>serialized result set in cases like queries with UNION, or with lots
>>of optionals. However, IMHO minimizing the size of the serialization
>>has never been a major design goal of the XML result format,
> 
> 
> Perhaps you missed a bit of WG history...
> 
> [[
> 4.7 Bandwidth-efficient Protocol
> 
> The access protocol design shall address bandwidth utilization issues;
> that is, it shall allow for at least one result format that does not
> make excessive use of network bandwidth for a given collection of
> results.
> ]]
>  -- http://www.w3.org/TR/rdf-dawg-uc/#d4.7
> 
>> nor
>>should it be.
> 
> 
> Taken literally, that's a request to reconsider objective 4.7.
> I'm not going to take you literally, for now.

Fair enough, I was not aware of that.

FWIW I do not think the current design to make *excessive* use of 
bandwidth, except in corner cases. YMMV.

>> To be blunt: if you want to minimize the number of bytes
>>on the line, use compression, or better yet, dump XML and use a binary
>>format.
> 
> 
> If you're interested in fleshing out a design using compression
> or a binary format, perhaps the WG would support that and the
> commentor would be receptive.  Note that the comment comes with
 > measurement numbers, like any design in this space should.

Well, since you're asking: Sesame currently supports a binary result 
format that AFAIK is 100% compatible with SPARQL.

Compared to the SPARQL XML result format, serialization size in bytes 
in this binary format is reduced to about 5-25%. I've not done 
rigorous testing, but executing queries of various complexity seems to 
yield consistent results in this range of reduction. Processing time 
of _writing_ the query result is reduced to 50-80%. Processing time of 
_reading_ the query result is reduced to about 40-60%, roughly.

Again, these are not rigorous figures but rather rough indications, 
based on a single data set and just a few queries with different 
result set sizes that I ran just five minutes ago. I can run some more 
structured tests later if you want.

If you are seriously interested, I can write down the format design 
and submit it for the WG's consideration. It shouldn't take longer 
than a day to produce a first draft (though I'm not sure I can make 
time for it this week).

>>Of course that does not mean that we should never care about the
>>verbosity of the XML result format, but I think that in this case
>>there are significant disadvantages to allowing this, against a
>>advantage of which I am uncertain there are not other, better ways of
>>solving it.
>>
>>In the request, another option was mentioned: not dropping the
>><binding> element, but dropping <unbound> (and hence having an empty
>><binding> element). Although slightly more regular this is still more
>>expensive to process than the current LC format. As an example of
>>this: the current Sesame SPARQL XML result parser completely skips
>>binding elements and just jumps directly to the uri, literal, bnode or
>>unbound element. In the proposed format, this will no longer be
>>possible and instead it will have to do a check for each binding
>>element to see if it contains a subelement.
>>
>>Not saying that that is fiendishly difficult to do of course, but it
>>does make processing, or writing XSLT, more complex.
>>
>>Long story short: I have a preference for keeping the spec the way it
>>is now.
> 
> 
> Do you have any argument that you think would satisfy the commentor?

If, for purposes of minimizing the result set size in bytes, we offer 
a binary format with the reduction in size and processing time 
mentioned above, I think that would address his concern, although of 
course such a format is can not be processed with XSLT. The other 
option of using GZIP compression is still a viable alternative as 
well, IMHO.


Jeen
-- 
Jeen Broekstra          Aduna BV
Knowledge Engineer      Julianaplein 14b, 3817 CS Amersfoort
http://aduna.biz        The Netherlands
tel. +31 33 46599877
Received on Wednesday, 26 October 2005 14:53:36 UTC