Re: OFFSET/LIMIT, cursors, and DAWG scope boundaries from Bijan Parsia on 2005-04-06 (public-rdf-dawg@w3.org from April to June 2005)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Wed, 6 Apr 2005 08:50:23 -0400
To: "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>
Cc: 'RDF Data Access Working Group ' <public-rdf-dawg@w3.org>
Message-Id: <71435772-A69A-11D9-9EE4-0003936A0B26@isr.umd.edu>
On Apr 6, 2005, at 8:33 AM, Thompson, Bryan B. wrote:

> Gosh.  I find the permission for the provision in the abstract syntax
> with the exclusion for the specification of their semantics to be ...
> completely baffeling.

Especially in conjunction with the extensibility provision :)

[snip]
> On another take, different database platforms and different database
> drivers for those platforms typically make very different decisions
> concerning how many rows of a result set to "pre-fetch" while holding
> open a connection used to make a query.  In general, driver parameters
> may be used to cause more or fewer rows to be pre-fetched.  Some 
> choices
> that I have seen in the past are to pre-fetch all rows, to pre-fetch 
> some
> #of rows, etc.  It is also worth noting that each database specifies 
> the
> protocol used to talk with that database - this is not standardized.

I'm not sure if this is orthoganal or not. Presumably, pre-fetching 
while holidn ga connection open allows the application to decide to 
abort the retrieval before all rows have been transmitted to the client 
(and thus, perhaps, even materialized in the database). That's 
sufficient for chunking.

> However, what I see as problematic from a protocol perspectice is the
> notion of the stateful connection within which the database client 
> makes
> requests and the database driver follows one policy or another to bring
> back results.

No questions that some sort of state/session token would be required in 
the HTTP case. OQL describes this. It seems workable.

>   It is within the context of store procedures and stateful
> connections that it makes sense to talk about cursors and "slicing" of
> result sets.

Yep.

> LIMIT is by far the simplest variant on "give me all the results".  No
> state is required in the connection.  The server is done as soon as it
> writes the data on the wire.

Yep.

> OFFSET requires far more design.  Simply re-generating the results
> would break transactional isolation for updates.

???

> (While we many not
> be specifying how updates occur, there are definately updates being
> made to Sparql graphs.)  This means that servers need to generate and
> hold result sets, e.g., in the sparql equivilent of a temporary table
> whose life cycle is linked to the query and the connection within which
> the query was made.

Limit is the "relative" end point. Offset is the actual start point. 
(Hmm. I didn't see any statement forbidding the offset > limit. That 
just returns 0 results?) Both are one shot deals (i.e., they are done 
once in the context of a query).

>   Given the current design for OFFSET which places it
> inside of the query syntax rather than the protocol layer, I am not 
> sure
> that it is feasible to design in a manner that guarentees transactional
> isolation across "slices" (it seems to imply that the server needs to
> compare the queries textual (a nightmare) and identify the recent 
> result
> set for the "same" query but with a different (or without any) offset).

I think it just punts on that. If you don't do order by, the results 
won't be the same. If there are updates, the results could be 
different. The same as with limit. Without order by, repeated limited 
queries could return (i'm feeling lucky) 10 different results each 
time.

> OFFSET is also a problem since the client is not marking in the initial
> query that it is interested future offsets.

There are no future offsets.

> This makes it difficult to
> write firewall rules that would deny queries that require the server to
> hold result sets.
>
> So, I think that OFFSET is a *bad* idea since it is pretty much going 
> to
> break transactional isolation unless we have stateful connections, 
> which
> we don't.

Well, I think offset is weird, but I don't get your argument against it 
(in the light that it *isn't* a cursor and doesn't imply session 
persistence). I also think chunking, which *does* require session 
persistence, is v. useful. We had explicit request for it. Any 
"Browsing the kb" type application (e.g., for a search engine) will 
want this kind of behavior, esp. when the resutls sets are v. large.

Cheers,
Bijan.
Received on Wednesday, 6 April 2005 12:50:33 UTC