RE: v1.89 3.10a - Iterative Query from Seaborne, Andy on 2004-05-28 (public-rdf-dawg@w3.org from April to June 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 28 May 2004 13:06:23 +0100
To: Farrukh Najmi <Farrukh.Najmi@Sun.COM>, "''public-rdf-dawg@w3.org' '" <public-rdf-dawg@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E808031A9ADD@0-mail-br1.hpl.hp.com>
-------- Original Message --------
> From: Farrukh Najmi <mailto:Farrukh.Najmi@Sun.COM>
> Date: 25 May 2004 15:23
> 
> Seaborne, Andy wrote:
> 
> > Isn't this design objective a cursor mechanism as it says "fetching in
> > chunks" and cursors are out of scope by 2.3 of the charter?
> > 
> > 
> I think that "fetching in chunks" does not imply a server side cursor
> requirement that maintains state on the server.
> 
> In my experience with OASIS ebXML Registry (ISO 15000 part 3 and 4) this
> can be supported without any requirement for maintaining state on the
> server via a cursor mechanism. 

> All that is needed is for the query
> syntax to allow specifying a startIndex and maxResults parameters. The
> maxResults parameter is covered by the original result limits (3.10).
> The startIndex parameter allows specify the begining of the next chunk
> of data. The response returned includes startIndex for the chunk of data
> returned and totalResultCount to indicate how many results actually
> matched. 
> 
> See section 8.1.4 in:
> 
>
http://www.oasis-open.org/committees/regrep/documents/2.5/specs/ebrs-2.5.pdf
> 

[[I've extracted the text and appended to this message]]

OK - so this avoids being a true (as in transactionally consistent) cursor
mechanism by relaxing the consistency requirements across chunk fetching.
In ebXML it is "a normative optional feature".

For the ebXML registry this is OK, but for DAWG I think this is a very high
price to pay especially as there is a presumption that the results have a
natural ordering.  RDF has no concept of order, so re-evaluation may lead to
different orderings within the result set.

Many hash-based store could radically reorder results, sometimes even in the
absense of update because of internal cache changes.  Certainly Jena could -
we use hash-based caching.  It is now not the likely case that one or two
results are missed or duplicated but that large sections of the result set
are missed or duplicated silently.  I don't see a way the client can be
informed without server state or the client returning enough state each time
to rebuild the result set in each subsequent chunk request.  Without
constraining the server implementation, that is going to be a lot of
client-returned state.

Therefore, I do not support the *requirement* "3.10a Iterative Query
(variant)".

	Andy

> > The result limits (3.10) is a more useful requirement in that it
> > allows a client to limit the results size in case of asking for too
> > much.  This is different from fetching in chunks.
> > 
> > 
> I see "fetching in chunks" as extending the capabilities of result
> limits (3.10) by providing an additional control parameter to the
> client. 
> 
> 
> > I think we should rely on the mechanisms that the underlying protocol
> > can supply even though these can be difficult to control.  e.g. TCP
> > flow control.  Writing server code that tracks the state of partial
> > client request can be every limiting; mis-behaving clients can
> > intentional or unintentionally attack the server.
> > 
> > 
> I totally agree. However, I think that server side tracking of partial
> client request is not needed to support the suggested extension to 3.10
> to support "fetching in chunks".


   = = = = = = = = = = = = = =
    
Text from section 8.1.4 text is around line 1762:

"""
The iterative query feature is a normative optional feature
of the registry. The AdhocQueryRequest and AdhocQueryResponse
support the ability to iterate over a large result set matching
a logical query by allowing multiple AdhocQueryRequest requests
to be submitted such that each query requests a different sliding
window within the result set. This feature enables the registry to
handle queries that match a very large result set, in a scalable manner.
The iterative queries feature is not a true Cursor capability as
found in databases. The registry is not required to maintain
transactional consistency or state between iterations of a query.
Thus it is possible for new objects to be added or existing objects
to be removed from the complete result set in between iterations.
As a consequence it is possible to have a result set element be
skipped or duplicated between iterations. 

Note that while it is not required, it may be possible for
implementations to be smart and implement a transactionally
consistent iterative query feature. It is likely that a future
version of this specification will require a transactionally
consistent iterative query capability.
"""
Received on Friday, 28 May 2004 08:06:54 UTC