- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Fri, 11 Jun 2004 11:36:02 +0100
- To: Jim Hendler <hendler@cs.umd.edu>, "Seaborne, Andy" <andy.seaborne@hp.com>, RDF Data Access Working Group <public-rdf-dawg@w3.org>
-------- Original Message -------- > From: Jim Hendler <> > Date: 9 June 2004 22:56 > > At 14:07 +0100 6/9/04, Seaborne, Andy wrote: > > > > -- 3.10 Result Limits > > > > > > > > straw poll: who's convinced this is a requirements? are we close > > > > to a decision here? AndyS: RDQL does not have a limit. Joseki > > > provides limits. PatH: it could mean 2 things: I don't want more > > > than 10 answers and the next ones after that or *just* 10 > > > > It occurred to me that one way of differentiating between exactly 10 > > and more than 10 (and an unknown number) when we have "LIMIT 10" is > > to ask for > > 11. If 11 results return up, it is more than 10. Otherwise, it is > > exact and 10 or less. This is making the (trailing) flag we > > discussed a matter of whether the 11th slot gets filled at all. > > > > Hence, no requirement for a special flag and the client can decide > > whether to ask for 11 if it needes to know the difference. It still > > might be a better design overall to have a flag. > > > > Andy > [snip] > > Andy - referring to two of your messages at the same time (this one > and the one on streamable results) I wonder if both of these are > needed -- a lot of query systems I've played for have protocols which > work by returning the number of answers and the first one, and then > have a mechanism for requesting the next N - if we were to design > that as the streamable interface, would there still be a need for > 3.10? This design has several effects which I regard as undesirable for a web system: 1/ Requires the server to retain state across client operations. If a server is serving large number of clients, then this is a real burden. On the web, with long delays and slow links, serving client requests can be long and so there are many concurrent requests. If each requires server state, then the burden is high. If that state is a copy of the results, it is a lot of system resources. [Tight coupled systems would not have a big issue here as the expection would be much lower numbers of concurrent requests even if the # clients is big]. 2/ Adds extra round trips to fetch the additional results. On the web RTT is long and clients can be slow to react. This would significantly impact performance. Again, more tightly coupled systems have lower RTT and higher badnwidth so do not suffer this. Note that 1+2 allow a simple DOS attack by starting queries and never finshing them. 3/ Consistency issues If the KB can be updated between fetches of the results, then this now needs locking or a copy of the results. Both have performance implications. Example: MySQL by default returns all results. If you turn on this effect then a lock is applied to other access. I think the MySQL default as having been choosen because it makes the DB server better at serving in DB-backed web sites. A streamable protocol does not require the client to call for more results. No flow control at this level - adding flow control (over what TCP has - that that is very clever and highly tuned) is a lot of work for DAWG and a lot of work for implementers. A streamable protocol requires the encoding to be in a an order so that results can be extracted without needing to see the end of results, or, more usually, all of one result possibility is encodes before starting the second. This does not require server state across client requests, nor does it require additional client calls. And the design you propose is in the spirit of "cursors" (see charter 2.3). Andy > -JH > (note: in practice there is often something that keeps the system > from having to actually count the results when there is a large > number - usually just some flag that says "*many"" when the number is > greater than some parameter -- that could either be in the design or > could just be how people really implement for efficiency)
Received on Friday, 11 June 2004 06:36:28 UTC