- From: Jeen Broekstra <jeen@aduna.biz>
- Date: Wed, 26 Oct 2005 16:54:47 +0200
- To: Dan Connolly <connolly@w3.org>
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Dan Connolly wrote: > On Wed, 2005-10-26 at 15:04 +0200, Jeen Broekstra wrote: > >>Dan Connolly wrote: >> >>>This request seems pretty reasonable: >>> >>>[[ There are at least two ways to trim the results back down with >>>just syntax changes. The least intrusive change would be to just >>>drop the <unbound> tag, and have it be implicit with <binding >>>name=".."/>. More drastic is to just drop the entire <binding> tag >>> when the variable is unbound, since the information can be >>>retrieved from the head. ]] -- SPARQL Results Format and Unbound >>>Variables http://www.w3.org/mid/42F4CEEB.5090306@umd.edu aka >>>http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0043 > > > [...] > > >>Also, of course, as Steve already mentioned, it makes writing XSLT >>forms for query results quite a bit harder. >> >>The major argument in favor of the change is the size of the >>serialized result set in cases like queries with UNION, or with lots >>of optionals. However, IMHO minimizing the size of the serialization >>has never been a major design goal of the XML result format, > > > Perhaps you missed a bit of WG history... > > [[ > 4.7 Bandwidth-efficient Protocol > > The access protocol design shall address bandwidth utilization issues; > that is, it shall allow for at least one result format that does not > make excessive use of network bandwidth for a given collection of > results. > ]] > -- http://www.w3.org/TR/rdf-dawg-uc/#d4.7 > >> nor >>should it be. > > > Taken literally, that's a request to reconsider objective 4.7. > I'm not going to take you literally, for now. Fair enough, I was not aware of that. FWIW I do not think the current design to make *excessive* use of bandwidth, except in corner cases. YMMV. >> To be blunt: if you want to minimize the number of bytes >>on the line, use compression, or better yet, dump XML and use a binary >>format. > > > If you're interested in fleshing out a design using compression > or a binary format, perhaps the WG would support that and the > commentor would be receptive. Note that the comment comes with > measurement numbers, like any design in this space should. Well, since you're asking: Sesame currently supports a binary result format that AFAIK is 100% compatible with SPARQL. Compared to the SPARQL XML result format, serialization size in bytes in this binary format is reduced to about 5-25%. I've not done rigorous testing, but executing queries of various complexity seems to yield consistent results in this range of reduction. Processing time of _writing_ the query result is reduced to 50-80%. Processing time of _reading_ the query result is reduced to about 40-60%, roughly. Again, these are not rigorous figures but rather rough indications, based on a single data set and just a few queries with different result set sizes that I ran just five minutes ago. I can run some more structured tests later if you want. If you are seriously interested, I can write down the format design and submit it for the WG's consideration. It shouldn't take longer than a day to produce a first draft (though I'm not sure I can make time for it this week). >>Of course that does not mean that we should never care about the >>verbosity of the XML result format, but I think that in this case >>there are significant disadvantages to allowing this, against a >>advantage of which I am uncertain there are not other, better ways of >>solving it. >> >>In the request, another option was mentioned: not dropping the >><binding> element, but dropping <unbound> (and hence having an empty >><binding> element). Although slightly more regular this is still more >>expensive to process than the current LC format. As an example of >>this: the current Sesame SPARQL XML result parser completely skips >>binding elements and just jumps directly to the uri, literal, bnode or >>unbound element. In the proposed format, this will no longer be >>possible and instead it will have to do a check for each binding >>element to see if it contains a subelement. >> >>Not saying that that is fiendishly difficult to do of course, but it >>does make processing, or writing XSLT, more complex. >> >>Long story short: I have a preference for keeping the spec the way it >>is now. > > > Do you have any argument that you think would satisfy the commentor? If, for purposes of minimizing the result set size in bytes, we offer a binary format with the reduction in size and processing time mentioned above, I think that would address his concern, although of course such a format is can not be processed with XSLT. The other option of using GZIP compression is still a viable alternative as well, IMHO. Jeen -- Jeen Broekstra Aduna BV Knowledge Engineer Julianaplein 14b, 3817 CS Amersfoort http://aduna.biz The Netherlands tel. +31 33 46599877
Received on Wednesday, 26 October 2005 14:53:36 UTC