Re: allow implicitly unbound variables in SPARQL results? from Seaborne, Andy on 2005-10-27 (public-rdf-dawg@w3.org from October to December 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Thu, 27 Oct 2005 10:10:28 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <43609984.8010701@hp.com>
Jeen Broekstra wrote:
> Kendall Clark wrote:
> 
>>On 16:54, Wed 26 Oct 05, Jeen Broekstra wrote:
>>
>>
>>
>>>FWIW I do not think the current design to make *excessive* use of 
>>>bandwidth, except in corner cases. YMMV.
>>
>>
>>Well, FWIW, that corner case is the commentor's *common* case; that 
>>is, the primary of SPARQL and the results format involves 
>>(necessarily) lots and lots and lots of unbounds, because the queries
>> involve OPTIONALs and UNIONS (however they're spelled, I can't 
>>recall).
> 
> 
> Yes, you are right, I should not have put it that way. I only meant that
> for many use cases/queries (in which optionals and unions are not, or
> sparingly) used, the bandwidth use is just fine with the current design,
> and for many use cases, the ease/speed of processing is a major issue.
> 
> 
>>That's precisely the problem that led to the unusual step of thinking
>> we could save some bytes and still have an easily human readable 
>>format, keeping it in XML.
>>
>>I don't believe the change to <binding/> makes processing that much 
>>more difficult.
> 
> 
> To clarify: I think my major concern (with both stripping and
> collapsing, though mostly with collapsing) is not so much that it is
> more difficult but that it is more costly in terms of processor
> performance. Note that Ron's tests only give performance figures for the
> case where you actually _have_ a significant size reduction (because it
> contains a lot of unbounds).
> 
> I'd like to see some figures on how comparative processor performance is
> for result sets that contain no unbounds (I'll see if I can come up with
> some figures on this myself tomorrow, or perhaps Ron will do this,
> you/Bijan mentioned he'd do some extra tests). That would give us a more
> complete picture of the consequences of either design.
> 
> 
>>>If, for purposes of minimizing the result set size in bytes, we 
>>>offer a binary format with the reduction in size and processing 
>>>time mentioned above, I think that would address his concern, 
>>>although of course such a format is can not be processed with XSLT.
>>> The other option of using GZIP compression is still a viable 
>>>alternative as well, IMHO.
>>
>>I am only guessing here, and Bijan mentioned that Ron will be doing 
>>some further tests, but I'd be really surprised if our organization 
>>got behind a binary format. But, again, that's just a guess, not a 
>>position.
> 
> 
> I can very well imagine that it is simply not a good idea to introduce
> this extra format into the WG at this stage of the game. I am happy to
> invest time into it though if other people think it's worthwile.
> 
> *shrug* we need to document it for Sesame users anyway ;)
> 
> Jeen
> 

A different perspective:

ARQ/Joseki3 streams both client and server.  With HTTP, processing of results 
in the client can proceed in parallel with the server generaing them. The 
time-to-first result is now independent of the transmission and XML parsing 
costs of the whole result set.  The time-to-last-result is lessen because the 
the client processing costs are in the time dead, wait time that would occur 
in a serial pipeline. In addition, the client end can now use StAX, a pull 
parser, so it is also using a lighter weight parsing system than XSLT 
processing over a full DOM.  This could then be connected to STX for streaming 
transformation of results.

For SOAP, I haven't done this yet.  SOAP stacks are being to appear with the 
possibility of streaming so I'm hopeful here as well.

 > *shrug* we need to document it for Sesame users anyway ;)

You document it and I'll (look to) implementing it :-) My requirements are as 
for the discussion had  pre-DAWG;  I'd like to be able to process the results 
without having to deeply inspect the query so a declaration of results 
variables helps.  And a MIME type.

	Andy
Received on Thursday, 27 October 2005 09:10:36 UTC