Streamability from Seaborne, Andy on 2004-06-11 (public-rdf-dawg@w3.org from April to June 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 11 Jun 2004 11:36:02 +0100
To: Jim Hendler <hendler@cs.umd.edu>, "Seaborne, Andy" <andy.seaborne@hp.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E808036155C5@0-mail-br1.hpl.hp.com>
-------- Original Message --------
> From: Jim Hendler <>
> Date: 9 June 2004 22:56
> 
> At 14:07 +0100 6/9/04, Seaborne, Andy wrote:
> > >  > -- 3.10 Result Limits
> > >  >
> > >  > straw poll: who's convinced this is a requirements? are we close
> > >  > to a decision here? AndyS: RDQL does not have a limit. Joseki
> > >  provides limits. PatH: it could mean 2 things: I don't want more
> > >    than 10 answers and the next ones after that or *just* 10
> > 
> > It occurred to me that one way of differentiating between exactly 10
> > and more than 10 (and an unknown number) when we have "LIMIT 10" is
> > to ask for 
> > 11.  If 11 results return up, it is more than 10.  Otherwise, it is
> > exact and 10 or less.  This is making the (trailing) flag we
> > discussed a matter of whether the 11th slot gets filled at all.
> > 
> > Hence, no requirement for a special flag and the client can decide
> > whether to ask for 11 if it needes to know the difference.  It still
> > might be a better design overall to have a flag.
> > 
> > 	Andy
> [snip]
> 
> Andy - referring to two of your messages at the same time (this one
> and the one on streamable results) I wonder if both of these are
> needed -- a lot of query systems I've played for have protocols which
> work by returning the number of answers and the first one, and then
> have a mechanism for requesting the next N -  if we were to design
> that as the streamable interface, would there still be a need for
> 3.10?

This design has several effects which I regard as undesirable for a web
system:

1/ Requires the server to retain state across client operations.

If a server is serving large number of clients, then this is a real burden.
On the web, with long delays and slow links, serving client requests can be
long and so there are many concurrent requests. If each requires server
state, then the burden is high.  If that state is a copy of the results, it
is a lot of system resources.

[Tight coupled systems would not have a big issue here as the expection
would be much lower numbers of concurrent requests even if the # clients is
big].

2/ Adds extra round trips to fetch the additional results.  On the web RTT
is long and clients can be slow to react.  This would significantly impact
performance.

Again, more tightly coupled systems have lower RTT and higher badnwidth so
do not suffer this.

Note that 1+2 allow a simple DOS attack by starting queries and never
finshing them.

3/ Consistency issues

If the KB can be updated between fetches of the results, then this now needs
locking or a copy of the results.  Both have performance implications.


Example: MySQL by default returns all results.  If you turn on this effect
then a lock is applied to other access.  I think the MySQL default as having
been choosen because it makes the DB server better at serving in DB-backed
web sites.


A streamable protocol does not require the client to call for more results.
No flow control at this level - adding flow control (over what TCP has -
that that is very clever and highly tuned) is a lot of work for DAWG and a
lot of work for implementers.

A streamable protocol requires the encoding to be in a an order so that
results can be extracted without needing to see the end of results, or, more
usually, all of one result possibility is encodes before starting the
second.

This does not require server state across client requests, nor does it
require additional client calls.

And the design you propose is in the spirit of "cursors" (see charter 2.3).

	Andy

>    -JH
>   (note: in practice there is often something that keeps the system
> from having to actually count the results when there is a large
> number - usually just some flag that says "*many"" when the number is
> greater than some parameter -- that could either be in the design or
> could just be how people really implement for efficiency)
Received on Friday, 11 June 2004 06:36:28 UTC