Re: Proposed reply to Chris Wilper: Real-world use case for 3.10 from Seaborne, Andy on 2004-07-27 (public-rdf-dawg@w3.org from July to September 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 27 Jul 2004 10:17:18 +0100
To: Tom Adams <tom@tucanatech.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <41061D9E.3030709@hp.com>
Tom,

Chris asks for LIMIT and OFFSET in order to do client-side control of 
the flow of results.

"3.10 Result Limits" is approved.
"3.12 Streaming Results" is approved

we also noted the relationship to sorting matters.  But this isn’t LIMIT 
and OFFSET where the client asks for just a slice of the results, and 
then come back for another slice later.  The slices asked for need not 
be in order so result set stability across calls might be expected 
(transactions?).

It may be in Chris's use case that the client will ask for chunks in 
order, in which case streaming using a suitable XML encoding (that is, 
the whole document does not need to be stored before further processing) 
and LIMIT may be sufficient because the client can influence the results 
sufficiently, but it isn't what he is asking for.

Illustration: Google lists for first 10 results, then you can jump 
around the "result set" using the page links at the bottom.

Example: One style of facetted browsers show the first N results when 
the user has a lot of items in a category.  The client UI never 
retrieves the whole result set so just LIMIT is a win.

The limitations on JDBC drivers noted in the F2F minutes applies in 
default configuration.  Having streams results has consequences - for 
MySQL that means locking over the length in time that the results are 
active with possibly adverse effects on the overall system performance.

I would like to understand Chris's use case better.  The use case has 
the client and server quite tightly designed together and possibly 
deployed.  It does not sound like a general browser-ish UI applied to 
some unknown RDF store.  It may be that LIMIT+Streaming is sufficient 
(not ideal, but tolerable)?  Alternatively, it may be we need different 
level in the protocol, with a simple, general web-wide one query, one 
response mode and than a more complex one for closer associations of 
client and server.

We should also note charter item "2.3 Cursors and proofs" (I don't 
understand why cursors and proofs are lumped together).

	Andy


Tom Adams wrote:

> Below is an outline of my proposed reply to Chris Wilper on his use 
> case for requirement 3.10, posted to public-rdf-dawg-comments@w3.org.
> 
> 
> ----
> 
> 
> Hi Chris,
> 
> Thanks for your posting to the DAWG comments list. The DAWG is always 
> happy to receive comments and use cases on its proposed requirements.
> 
> The requirement you noted was moved from PENDING to APPROVED at the 
> DAWG face to face on the 15th July. You can view the details at:
> 
> http://www.w3.org/2001/sw/DataAccess/ftf2#req
> 
> Keep the comments coming!
> 
> Cheers,
> Tom
> 
> 
> 
> On 06/07/2004, at 1:25 PM, Chris Wilper wrote:
> 
> 
>>Hi,
>>
>>Looking at the Requirements/Use Cases document, I noticed that 3.10 
>>and 3.10a
>>had "Pending" status.  We[1] plan on using an rdf triplestore to back 
>>a large
>>metadata repository, exposed to other systems via the OAI-PMH[2].  
>>While not
>>being too domain and protocol-specific here, I'll describe our case:
>>
>>We have a large collection of metadata in a triplestore that we want
> 
> to
> 
>>make available to people through a set of queries.  Someone typically 
>>asks,
>>"Give me the metadata that has changed since last week and is in XYZ
>>collection", or simply, "Give me all the metadata".
>>
>>It is a requirement for us that the responses can come in chunks: XML 
>>is
>>sent over the wire, and rather than require all of our clients (and
> 
> our
> 
>>server)
>>to be able to handle arbitrarily large chunks of xml in one stream,
> 
> our
> 
>>server
>>can be configured to give only, say 1,000 responses, along with a
>>"resumption token" that can be used for subsequent requests.
>>
>>Without the ability to specify LIMITS/OFFSETS with the triplestore 
>>query, we
>>would
>>need to stream everything to disk and manage much more state within
> 
> our
> 
>>application.
>>
>>[1] http://www.fedora.info/ and http://www.nsdl.org/
>>[2] OAI-PMH is a protocol for exposing xml metadata in a repository.
>>    See http://www.openarchives.org/OAI/openarchivesprotocol.html
>>
>>___________________________________________
>>Chris Wilper
>>Cornell Digital Library Research Group
>>http://www.cs.cornell.edu/~cwilper/
>>
>>
Received on Tuesday, 27 July 2004 05:18:30 UTC