Re: allow implicitly unbound variables in SPARQL results?

On Oct 26, 2005, at 9:04 AM, Jeen Broekstra wrote:

> Dan Connolly wrote:
>> Reviewing last call comment status, this one is (a) not connected to  
>> an open issue, (b) not just editorial, and (c) hasn't gotten much  
>> airtime.
>> This request seems pretty reasonable:
>> [[ There are at least two ways to trim the results back down with  
>> just syntax changes.  The least intrusive change would be to just  
>> drop the <unbound> tag, and have it be implicit with <binding  
>> name=".."/>.  More drastic is to just drop the entire <binding> tag
>>  when the variable is unbound, since the information can be retrieved  
>> from the head. ]] -- SPARQL Results Format and Unbound Variables  
>> http://www.w3.org/mid/42F4CEEB.5090306@umd.edu aka  
>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/ 
>> 0043
>> So how about we make <binding> elements for unbound variables  
>> optional?
>
> Like Steve and Andy said, having <binding> optional (in the sense of
> it may or may not be there) sounds like a bad idea, but specifying
> that there is never a binding element if the variable is unbound is
> possible I guess.
>
> There are some pretty big drawbacks to consider though, in terms of
> cost of processing.

 From his post:

""""Using some simple xslt[7], I was able to create sample result sets  
with
the  unbounds stripped[8] and the bindings collapsed[9]. The stripped
files were about 68% the size of the original, while the collapsed files
were 45 % of the originals size.

The parse time followed similarly.  I used a dumb script[10] that timed
how long it took for expat's xmlwf to complete. The stripped files took
about 61% of the time to parse as the complete files, and the collapsed
files took about 42% of the time it took to parse the originals.  The
raw results are at [11].""""

So, just on processing *time*, there's a lot of slack introduced by  
shrinking the format.

> In the current spec, a result processor can simply assume that every
> row has the number of bindings specified in the header, in the order
> specified in the header. Arguably the link between bindings and
> variables (that is, the name attribute) is even redundant in the
> current spec. XML result can be processed very fast because no
> explicit matching between column names and binding elements needs to
> be done at all.

I'll ask him to do some processing experiments, but  it could be that,  
given the parse savings, things are overall better. Also, presumably,  
most results processing, or the core sort of results processing,  
includes restrieving the results over the net.

[snip]
> Also, of course, as Steve already mentioned, it makes writing XSLT
> forms for query results quite a bit harder.

Programmer time is, of course, an isssue.

> The major argument in favor of the change is the size of the
> serialized result set in cases like queries with UNION, or with lots
> of optionals. However, IMHO minimizing the size of the serialization
> has never been a major design goal of the XML result format, nor
> should it be. To be blunt: if you want to minimize the number of bytes
> on the line, use compression, or better yet, dump XML and use a binary
> format.

I've asked him to investigate the improvements due to compression. Of  
course, that adds overhead.
[snip]
> Long story short: I have a preference for keeping the spec the way it
> is now.
>
> By the way, if size of the serialization does become a major design
> goal, there are other, more obvious changes to make to the format: the
> binding element could be dropped altogether, for example. I'm not
> advocating this, I think regularity and ease of processing are more
> important features than number of bytes.
[snip]

I don't think it's an *overarching* goal, just that in combination with  
certain other features, one gets surprising blow ups.

Cheers,
Bijan.

Received on Wednesday, 26 October 2005 13:34:01 UTC