Result set lantecy from Steve Harris on 2004-06-02 (public-rdf-dawg@w3.org from April to June 2004)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Wed, 2 Jun 2004 15:37:20 +0100
To: DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <20040602143720.GC7484@login.ecs.soton.ac.uk>

This is the other side of the "bandwidth" problem. I think it completes
"ACTION: SteveH to write his experiences on bandwidth efficiency for
querying and email to group."

The problems I've given myself with increased latency are basically down
to overly verbose return formats (again) and things that require the
server, and/or client to have to whole result set before it can start
work.

In 3store the result set is implicitly DISTINCT'd (ie. each unique
matching binding row is only returned once) - this was done at users
request, but with hindsight it would have been better to add a DISTINCT
keyword that turned it on selectively. In practise the server has to hold
a copy of the result set while it's returning results, to ensure it
doesn't send duplicates. That doesn't add much latency in theory, but the
most efficient way of implementing it that we found builds a list then
sends the whole lot after all the results have been gathered. In any case
it adds some load to the server.

Another example would be result set sorting, which again requires the
server to hold the result set. Ditto for RDF/XML or any other format that
requires prefixes at the top of the file.

Some of the clients only have DOM XML parsers, and so they have to wait
for the whole XML result set to be returned before they can start
processing - we cant help that, but adding things like SQL LIMIT means
that they can work on smaller sets.

- Steve

Received on Wednesday, 2 June 2004 10:37:22 UTC