Re: allow implicitly unbound variables in SPARQL results?

Dan Connolly wrote:
> On Fri, 2005-12-09 at 16:29 +0100, Jeen Broekstra wrote:
> 
>> Do we have enough data here to perhaps put it on next telcon's
>> agenda and make a decision one way or the other?
> 
> 
> I'd like to think so.
> 
> But I don't see a crisp proposal.
> 
> Can you state the proposal in a paragraph or so, along with a test
> case/example?

Alright, I'll summarize the current status, and formulate three
proposals to vote on (hopefully that's crisp enough?).

The issue is how the XML Query Result Format encodes unbound variables
in the result. There are three proposals:

  a) the LC Design, in which an explicit <unbound/> element is used
     to encode unbound values for a result:

    <result>
     <binding name="x"><literal>foo</literal></binding>
     <binding name="y"><unbound/></binding>
    </result>

  b) A 'stripped' variation where an unbound value is encoded by an
    empty binding element:

    <result>
     <binding name="x"><literal>foo</literal></binding>
     <binding name="y"/>
    </result>

  c) A 'collapsed' variation where an unbound value is encoded by
     complete omission:

    <result>
     <binding name="x"><literal>foo</literal></binding>
    </result>

Option a) has as main advantage rigidity of structure, meaning very
simple processing (e.g. in XSLT). Its disadvantages are high bandwidth
use and processing time in large result sets.

Option b) and c) are less and more radical proposals to 'compact' the
result set format, resulting in less bandwidth use and faster processing
(see [3] and [4] for concrete tests/figures). Main disadvantage of
either proposal (both b and c) is more complex processing in XSLT,
although it has been shown in [1] that it is doable.

Orthogonally there is the issue of whether or not 'compactness' of the
XML result format should be such a priority (at least, at the cost of
simplicity/clarity/ease of processing): it has been suggested that
(gzip) compression resolves the bandwidth pain to a large degree and if
processing time is an issue, perhaps a dedicated binary format is the
way to go (as, for example, outlined in [2]). These could be considered
reasons to stay with the LC design despite its higher verbosity, and if 
the WG decides this, these reasons can be used to justify this decision 
to the commentor who raised the original issue.

The working group needs to reach a decision, given the data summarized
above, on whether to stick with the LC design (proposal a), to go for
the 'stripped' version as a compromise (proposal b), or to change the
format more radically to the collapsed version (proposal c).

Jeen

[1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0325
[2] http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0131
[3] http://lists.w3.org/Archives/Public/public-rdf-dawg/2005OctDec/0318
[4]
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0043
-- 
Jeen Broekstra          Aduna BV
Knowledge Engineer      Julianaplein 14b, 3817 CS Amersfoort
http://aduna.biz        The Netherlands
tel. +31 33 46599877

Received on Tuesday, 13 December 2005 13:30:14 UTC