Re: OFFSET/LIMIT, cursors, and DAWG scope boundaries from Bijan Parsia on 2005-04-06 (public-rdf-dawg@w3.org from April to June 2005)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Wed, 6 Apr 2005 01:22:49 -0400
To: Dan Connolly <connolly@w3.org>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <EAD0C9BB-A65B-11D9-9EE4-0003936A0B26@isr.umd.edu>
On Apr 6, 2005, at 12:07 AM, Dan Connolly wrote:

> In our discussion of the sort issue today, I was kinda
> interested to try to move the "give me the answer in slices"
> functionality to the protocol, but this evening I remembered...
>
> "2.3 Cursors and proofs
>
> Some languages, for instance, OQL, define a layer of protocol that
> handles requester/server interactions for result set cursors or proofs.
> The abstract syntax may be extensible to express the relevant
> parameters, but their definition and effects are beyond the scope of
> this working group."
>
> http://www.w3.org/2003/12/swa/dawg-charter#protocol
>
> If I sorta disregard "proof" in that section, it seems to say
> that cursors are out of scope, and the sort of protocol
> cookie/tokens that came to my mind when we discussed moving
> OFFSET/LIMIT to the protocol seem to be an awful lot like
> cursors.

Well,

	http://burks.brighton.ac.uk/burks/foldoc/41/27.htm

""" 2. <database> In SQL, a named control structure used by an  
application program to point to a row of data. The position  of the row  
is within a table or view, and the cursor is  used interactively so  
select rows from columns."""

I fail to see that a chunking mechanism is so like cursors to be ruled  
out of scope by ruling out cursors.

> And if we're not supposed to do cursors in the protocol, it
> seems sneaky to stick that functionality in the QL.

Of course, the question is whether chunk the answers (with possible  
abortion the chunking) requires handling "requester/server interactions  
for result set cursors or proofs." or something sufficiently like. I  
think not, myself.

Hmm. I see:
	http://lists.trolltech.com/qt-interest/2004-11/thread01135-0.html
	http://selectservices.bentley.com/technotes/technotes/7125.htm
	http://www.techweb.com/encyclopedia/defineterm.jhtml? 
term=database+cursor
""" A record pointer in a database. When a database file is selected  
and the cursor is opened, the cursor points to the first record in the  
file. Using various commands, the cursor can be moved forward,  
backward, to top of file, bottom of file and so forth."""

	http://www.sleepycat.com/docs/java/com/sleepycat/db/Cursor.html

"""A database cursor. Cursors are used for operating on collections of  
records, for iterating over a database, and for saving handles to  
individual records, so that they can be modified after they have been  
read."""

Aside from motility, it seems like cursors support server side actions  
(modification).

	http://pybsddb.sourceforge.net/ref/am/cursor.html

"""A database cursor refers to a single key/data pair in the database.  
It supports traversal of the database and is the only way to access  
individual duplicate data items. Cursors are used for operating on  
collections of records, for iterating over a database, and for saving  
handles to individual records, so that they can be modified after they  
have been read
...
Once a database cursor has been opened, records may be retrieved  
(DBcursor->c_get), stored (DBcursor->c_put), and deleted  
(DBcursor->c_del)."""

	http://www.jaws.umn.edu/javascript_1.1/lwa1.htm
	http://www.trifox.com/wpapers/vtxmuxwp.html
	http://www.crystal-soft.com/globalapps/Dataset/dataset.htm

"""Dataset contains data from database, organized into records. Each  
row in dataset represents one record. Every record contain one or more  
fields. As it seen from the picture above, all records have the same  
structure. It is also possible for dataset to contain no records at  
all.  If dataset has at least one record, it has always an active  
record, also known as the database cursor. Any user interaction with  
database data is always performed through active record."""

	http://www.frick-cpa.com/ss7/Theory_Models.asp
"""	• 	There are two approaches to querying data: relational or  
procedural
	• 	Relational queries are constructed in SQL  and are set-based
[snip]	
	• 	Procedural queries involve row-at-a-time processing; they  
'navigate'  through a record set to locate the desired rows
	• 	Cursors are an example of a procedural approach to querying data
	• 	The code below uses a cursor to achieve the same result as the  
previous  SQL query"""
(Chunking could do some of this as you could always chunk at 1 binding)

	http://exchange.manifold.net/manifold/manuals/manifold/appendices/ 
troubleshooting/problems_with_tables.htm
"""Do not use the native Oracle ODBC driver. Use the Microsoft OLE DB  
Provider for ODBC and use Microsoft ODBC for Oracle. The native Oracle  
drivers are limited to forward-only cursors (which is by far the  
simplest and the lowest performance type of database cursor allowed in  
ODBC drivers) and thus are unusable in Manifold or many other programs,  
such as Microsoft Access. Future editions of Manifold will add special  
routines to allow use of the Oracle driver."""

(We're forward only.)

	http://ironbark.bendigo.latrobe.edu.au/subjects/WE/lectures/w09.d/ 
Lect17.html
"""DECLARE
Commonly used (in some, but not all SQL implementations) to  DECLARE a  
CURSOR. A  CURSOR is specified in the same way as a  SELECT statement,  
but doesn't immediately return  any data. Before use, a CURSOR must be   
OPENed. Subsequently, each use of a  FETCH statement will retrieve the  
next matching  row (as specified in the CURSOR declaration) from  the  
table. Useful for iterating through an number of returned  rows."""
	http://www.agiledata.org/essays/relationalDatabases.html
	(Search for "cursor"....some complaints; talks about forward scrolling  
cursors.)

On the contrary side:
	http://ksl.stanford.edu/KSL_Abstracts/KSL-03-14.html
OQL doesn't actually support cursors, but chunking.

"""Answers are delivered by the server in bundles, and the client can  
specify the maximum number of answers in each bundle. Each request from  
a client to a server for answers to a query can include an answer  
bundle size bound, and the server is required to respond by delivering  
an answer bundle containing at most the number of query answers given  
by the answer bundle size bound. The collection of all answers sent to  
the client by the server in a query-answering dialogue is called the  
response collection of that dialogue.

An answer bundle must also contain either a process handle or one or  
more character strings called termination tokens. The presence of a  
termination token in an answer bundle indicates that the server will  
not deliver any more answers to the query, and the presence of a server  
continuation in an answer bundle represents a commitment by the server  
to deliver another answer bundle if more answers to the query are  
requested by a client.

A client requests additional answers to a query by sending the server a  
server continuation containing the process handle provided by the  
server in the previously produced answer bundle and an answer bundle  
size bound for the next answer bundle to be produced by the server.  
Upon receiving a server continuation from a client, the server is  
expected to respond similarly by sending to that client another answer  
bundle. A client terminates a query-answering dialogue by sending the  
server a server termination containing the process handle provided by  
the server in the previously produced answer bundle.""

> I'm not sure how to interpret our charter here. I think I'll
> ask around. Advice is welcome.

The OQL tech report is helpful. I'll definitely need to check it out  
more closely. (I looked at DQL way back when. Just goes to show that  
rereading is good.)

I think this is a very unclear charter provision. I think there's  
enough wiggle room to interpret it as the group sees fit, though it  
would be hard to defend.

(BTW, I seem to have no hope to even parse 2.1 esp. in light of 1.8,  
esp as an out of scope provision.)

(And wow, 2.2. Hard to see how not to get slaughtered there.)

I think chunking is both useful, desirable, and easy to specify, and  
not terrible to implement esp. if we allow server speced timeouts. I  
believe the MIND lab might well object to its removal. The presence of  
Limit and offset (whoa....I find offset scary) indicates that the group  
thought that these were in scope. I think chunking is similar.

Whether in the language or the protocol, we can do several things to  
make chunking easier on the server, including signalling whether more  
results are wanted (or not), timeouts, faults for resource limitations  
etc. The design space doesn't seem large.

One advantage of doing it in the protocol, is we can define different  
interfaces or bindings that support chunking. The design space is a bit  
larger here, but still managable IMHO.

Cheers,
Bijan "The Google Session/Brain Dumper" Parsia.
Received on Wednesday, 6 April 2005 05:23:01 UTC