Re: allow implicitly unbound variables in SPARQL results? from Kendall Clark on 2005-12-14 (public-rdf-dawg@w3.org from October to December 2005)

From: Kendall Clark <kendall@monkeyfist.com>
Date: Wed, 14 Dec 2005 11:30:17 -0500
To: Jeen Broekstra <jeen.broekstra@aduna.biz>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <10E060C2-B78D-46CD-905D-EDD307941A4B@monkeyfist.com>
On Dec 14, 2005, at 4:55 AM, Jeen Broekstra wrote:
>  [[4.7 Bandwidth-efficient Protocol
>
>    The access protocol design shall address bandwidth utilization
>    issues; that is, it shall allow for at least one result format that
>    does not make excessive use of network bandwidth for a given
>    collection of results.
>
>    Status: Accepted.]]
>
> Whether or not the LC design meets this requirement is subjective I
> guess (what is "excessive", exactly?), however it has been shown that
> more bandwidth-efficient variations are not only possible, but  
> workable.

One way to gloss "excessive" -- though I'm not claiming this is  
necessarily what anyone *intended* -- is this: an excessive use of  
bandwidth is one where we choose to use bandwidth in such a way that  
is (a) functionally equivalent to some other, (b) more efficient use  
of bandwidth. That is, we take "excessive" to obligate us to choose  
the most efficent from among otherwise functionally equivalent design  
alternatives. "Functionally equivalent" is a bit loose, granted, but  
for a data format we'd say, at the least, that it means "conveys the  
same information".

I won't add "is a roughly equivalent processing burden" because,  
well, we didn't adopt a requirement or objective that the results  
format be easy to process. I don't remember anyone in the WG ever  
even discussing that. We talked a lot about having an XML format so  
that, for example, we could integrate with XQuery, but I don't recall  
any worry about ease-of-processing. That has all come, near as I can  
tell, after the fact of Ron Alford suggesting the existing format was  
a bit bloated.

As for ease of processing, there is a sense in which we are quibbling  
over trivialities. All of the formats, as people have demonstrated  
repeatedly, are *roughly* equivalent in the processing burden they  
impose on a competent programmer -- which is a cost borne by fewer  
agents than the cost of an inefficient protocol or data format, which  
is borne by *everyone* involved in some sense.

If or when programmers want *actually easy to process results*, they  
tend not to use XML at all. At least, one can make a very strong  
argument in that direction.

So, for example, I've been working on a document with some WG members  
for a JSON(.org) serialization of the query results format, since  
that's *trivially easy* for a programmer to process and integrates  
nicely with things like "Web 2.0", "AJAX", and "lightweight REST web  
services". (Where "integrates nicely with" means "is what people  
expect who build".)

Thus, I have to conclude that I shouldn't give much, if any weight in  
my own deliberations to the ease of processing argument. I think it  
focuses too much on very tiny perceived gains, for a relatively small  
number of people, at the expense of a cost that is imposed pretty  
much across the board.

> For these reasons, I feel that currently we can not really claim that
> XSLT processing is made so much harder by going for this design, and
> therefore I think option c is the right way to go.

In my deliberations about all of this I simply *granted* that option  
c did make processing more difficult to some degree. Even with that  
increased difficulty, I don't find that consideration very weighty on  
more general grounds. I'm glad to hear that there is some evidence to  
suggest that it's not really any more difficult at all. That  
strengthens the case for option c significantly.

>
> ==============
>
> Quite seperately from this, there is the issue of having an *XML*- 
> based
> result format in the first place. It has been shown that for  
> purposes of
> bandwidth efficiency, the choice of XML in the is a limiting  
> factor, and
> a dedicated (binary) format is much better.

My primary concern is that we don't *assume* that this is an either- 
or situation. I think there are good reasons to prefer several query  
result formats, even if only one is an actual Recommendation  
eventually. I can see utility in having more than one result format  
documented in various WG Notes. I intend to submit my JSON work as  
such once it's finished. A binary, non-XML textual, and (standards- 
blessed) XML results format seems a helpful mix covering a diversity  
of use cases.

> As one can see, the performance gain on practically all fronts by  
> using
> a binary format completely dwarfs any performance gain by  
> optimizing the
> XML format.

Well, it's faster but I don't know about "completely dwarfs"! :>

The one advantage of compressed XML over binary, vis-a-vis the last  
call design, is that it's *utterly trivial* to specify, requiring  
some tweaked language in the existing last call design doc.

But, that having been said, the only way I would oppose a binary  
results format is if it were to replace an XML format -- which I  
don't believe anyone would suggest -- or if I had to do the work to  
specify it! :>

> So a separate question is whether or not the WG wants to
> sanction (informally?) a specification of such a binary format (I know
> that Andy and I are at least interested in submitting such a format to
> W3C).

As I said above, I think this is an interesting niche for us to think  
about post-SPARQL 1.0, after a recharter. But it may make sense to do  
it before that. In either case, I think this is a good use of WG  
resources.

> If we do decide to do this, at takes away part of the reason for
> changing the XML format, but of course still both options will be  
> open.

I think they are completely orthogonal, especially given my preferred  
way of glossing "excessive" above. I don't think a binary format and  
an XML format (any XML format) are functionally equivalent. They  
scratch different itches and serve different use cases.

> I do, however, believe that if we decide to stick with the LC  
> design, we
> will _have_ to sanction this additional binary format, because  
> otherwise
> we have not sufficiently provided for requirement 4.7.

Which is an excellent reason, all other things being equal, for  
supporting option c.

Cheers,
Kendall Clark
Received on Wednesday, 14 December 2005 16:30:32 UTC