Re: Proposed reply to Chris Wilper: Real-world use case for 3.10 from Tom Adams on 2004-07-27 (public-rdf-dawg@w3.org from July to September 2004)

From: Tom Adams <tom@tucanatech.com>
Date: Tue, 27 Jul 2004 10:21:33 -0400
To: public-rdf-dawg@w3.org
Cc: Andy Seaborne <Andy.Seaborne@hp.com>
Message-Id: <431928B2-DFD8-11D8-B79A-000A95C9112A@tucanatech.com>
Hi Andy,

> Chris asks for LIMIT and OFFSET in order to do client-side control of 
> the flow of results.
>
> "3.10 Result Limits" is approved.
> "3.12 Streaming Results" is approved
>
> we also noted the relationship to sorting matters.  But this isn’t 
> LIMIT and OFFSET where the client asks for just a slice of the 
> results, and then come back for another slice later.  The slices asked 
> for need not be in order so result set stability across calls might be 
> expected (transactions?).

I don't think transactions are needed, but some kind of session-based 
state keeping would be required.

> It may be in Chris's use case that the client will ask for chunks in 
> order, in which case streaming using a suitable XML encoding (that is, 
> the whole document does not need to be stored before further 
> processing) and LIMIT may be sufficient because the client can 
> influence the results sufficiently, but it isn't what he is asking 
> for.
>
> Illustration: Google lists for first 10 results, then you can jump 
> around the "result set" using the page links at the bottom.

I think that this may be what he's looking for.

> Example: One style of facetted browsers show the first N results when 
> the user has a lot of items in a category.  The client UI never 
> retrieves the whole result set so just LIMIT is a win.
>
> The limitations on JDBC drivers noted in the F2F minutes applies in 
> default configuration.  Having streams results has consequences - for 
> MySQL that means locking over the length in time that the results are 
> active with possibly adverse effects on the overall system 
> performance.

I'll defer to Simon on how Kowari handles this internally, perhaps this 
can shed some light on the discussion, though perhaps he's already 
covered it anecdotally.

> I would like to understand Chris's use case better.  The use case has 
> the client and server quite tightly designed together and possibly 
> deployed.  It does not sound like a general browser-ish UI applied to 
> some unknown RDF store.  It may be that LIMIT+Streaming is sufficient 
> (not ideal, but tolerable)?  Alternatively, it may be we need 
> different level in the protocol, with a simple, general web-wide one 
> query, one response mode and than a more complex one for closer 
> associations of client and server.

I think Chris is after a combination of LIMIT and OFFSET. I know that 
he's discussed this issue in the past on the Kowari list, and has just 
posted a contribution (KModel), so I imagine this is what he's using.

But yes, we need to find out more information on what he is doing. I'll 
add asking for more information to my email.

> We should also note charter item "2.3 Cursors and proofs" (I don't 
> understand why cursors and proofs are lumped together).

You're on the ball as ever :)

Cheers,
Tom




> Tom Adams wrote:
>
>> Below is an outline of my proposed reply to Chris Wilper on his use 
>> case for requirement 3.10, posted to public-rdf-dawg-comments@w3.org.
>> ----
>> Hi Chris,
>> Thanks for your posting to the DAWG comments list. The DAWG is always 
>> happy to receive comments and use cases on its proposed requirements.
>> The requirement you noted was moved from PENDING to APPROVED at the 
>> DAWG face to face on the 15th July. You can view the details at:
>> http://www.w3.org/2001/sw/DataAccess/ftf2#req
>> Keep the comments coming!
>> Cheers,
>> Tom
>> On 06/07/2004, at 1:25 PM, Chris Wilper wrote:
>>> Hi,
>>>
>>> Looking at the Requirements/Use Cases document, I noticed that 3.10 
>>> and 3.10a
>>> had "Pending" status.  We[1] plan on using an rdf triplestore to 
>>> back a large
>>> metadata repository, exposed to other systems via the OAI-PMH[2].  
>>> While not
>>> being too domain and protocol-specific here, I'll describe our case:
>>>
>>> We have a large collection of metadata in a triplestore that we want
>> to
>>> make available to people through a set of queries.  Someone 
>>> typically asks,
>>> "Give me the metadata that has changed since last week and is in XYZ
>>> collection", or simply, "Give me all the metadata".
>>>
>>> It is a requirement for us that the responses can come in chunks: 
>>> XML is
>>> sent over the wire, and rather than require all of our clients (and
>> our
>>> server)
>>> to be able to handle arbitrarily large chunks of xml in one stream,
>> our
>>> server
>>> can be configured to give only, say 1,000 responses, along with a
>>> "resumption token" that can be used for subsequent requests.
>>>
>>> Without the ability to specify LIMITS/OFFSETS with the triplestore 
>>> query, we
>>> would
>>> need to stream everything to disk and manage much more state within
>> our
>>> application.
>>>
>>> [1] http://www.fedora.info/ and http://www.nsdl.org/
>>> [2] OAI-PMH is a protocol for exposing xml metadata in a repository.
>>>    See http://www.openarchives.org/OAI/openarchivesprotocol.html
>>>
>>> ___________________________________________
>>> Chris Wilper
>>> Cornell Digital Library Research Group
>>> http://www.cs.cornell.edu/~cwilper/
>>>
>>>
>
>
-- 
Tom Adams                  | Tucana Technologies, Inc.
Support Engineer           |   Office: +1 703 871 5312
tom@tucanatech.com         |     Cell: +1 571 594 0847
http://www.tucanatech.com  |      Fax: +1 877 290 6687
------------------------------------------------------
Received on Tuesday, 27 July 2004 10:21:55 UTC