Re: v1.89 3.10a - Iterative Query from Farrukh Najmi on 2004-05-28 (public-rdf-dawg@w3.org from April to June 2004)

From: Farrukh Najmi <Farrukh.Najmi@Sun.COM>
Date: Fri, 28 May 2004 08:17:57 -0400
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: "''public-rdf-dawg@w3.org' '" <public-rdf-dawg@w3.org>
Message-id: <40B72DF5.7040604@sun.com>
Seaborne, Andy wrote:

>-------- Original Message --------
>  
>
>>From: Farrukh Najmi <mailto:Farrukh.Najmi@Sun.COM>
>>Date: 25 May 2004 15:23
>>
>>Seaborne, Andy wrote:
>>
>>    
>>
>>>Isn't this design objective a cursor mechanism as it says "fetching in
>>>chunks" and cursors are out of scope by 2.3 of the charter?
>>>
>>>
>>>      
>>>
>>I think that "fetching in chunks" does not imply a server side cursor
>>requirement that maintains state on the server.
>>
>>In my experience with OASIS ebXML Registry (ISO 15000 part 3 and 4) this
>>can be supported without any requirement for maintaining state on the
>>server via a cursor mechanism. 
>>    
>>
>
>  
>
>>All that is needed is for the query
>>syntax to allow specifying a startIndex and maxResults parameters. The
>>maxResults parameter is covered by the original result limits (3.10).
>>The startIndex parameter allows specify the begining of the next chunk
>>of data. The response returned includes startIndex for the chunk of data
>>returned and totalResultCount to indicate how many results actually
>>matched. 
>>
>>See section 8.1.4 in:
>>
>>
>>    
>>
>http://www.oasis-open.org/committees/regrep/documents/2.5/specs/ebrs-2.5.pdf
>  
>
>
>[[I've extracted the text and appended to this message]]
>
>OK - so this avoids being a true (as in transactionally consistent) cursor
>mechanism by relaxing the consistency requirements across chunk fetching.
>In ebXML it is "a normative optional feature".
>
>For the ebXML registry this is OK, but for DAWG I think this is a very high
>price to pay especially as there is a presumption that the results have a
>natural ordering.  RDF has no concept of order, so re-evaluation may lead to
>different orderings within the result set.
>  
>
First let me admit that I am RDF challenged as yet so feel free to 
correct any misconceptions I may have.

The lack of order in RDF should be compared with the lack of order in 
the relational model. Order is really
introduced in SQL-92 the query model and syntax. By analogy that would 
mean that lack of order in RDF is not
a problem as long as DAWG model and syntax supports the equivalent of 
ORDER BY in SQL.

This brings up the question....

Is there a use case or requirement for the results of a DAWG query to be 
ordered by some specified attribute(s) of the RDF graph?
If DAWG supports ordering then IMO the above argument against Iterative 
Queries no longer hold.

If order cannot be imposed by a DAWG query on its result set then indeed 
above argument against Iterative Queries would hold and I will retract 
my suggestion.

-- 
Regards,
Farrukh



>Many hash-based store could radically reorder results, sometimes even in the
>absense of update because of internal cache changes.  Certainly Jena could -
>we use hash-based caching.  It is now not the likely case that one or two
>results are missed or duplicated but that large sections of the result set
>are missed or duplicated silently.  I don't see a way the client can be
>informed without server state or the client returning enough state each time
>to rebuild the result set in each subsequent chunk request.  Without
>constraining the server implementation, that is going to be a lot of
>client-returned state.
>
>Therefore, I do not support the *requirement* "3.10a Iterative Query
>(variant)".
>  
>

>	Andy
>
>  
>
>>>The result limits (3.10) is a more useful requirement in that it
>>>allows a client to limit the results size in case of asking for too
>>>much.  This is different from fetching in chunks.
>>>
>>>
>>>      
>>>
>>I see "fetching in chunks" as extending the capabilities of result
>>limits (3.10) by providing an additional control parameter to the
>>client. 
>>
>>
>>    
>>
>>>I think we should rely on the mechanisms that the underlying protocol
>>>can supply even though these can be difficult to control.  e.g. TCP
>>>flow control.  Writing server code that tracks the state of partial
>>>client request can be every limiting; mis-behaving clients can
>>>intentional or unintentionally attack the server.
>>>
>>>
>>>      
>>>
>>I totally agree. However, I think that server side tracking of partial
>>client request is not needed to support the suggested extension to 3.10
>>to support "fetching in chunks".
>>    
>>
>
>
>   = = = = = = = = = = = = = =
>    
>Text from section 8.1.4 text is around line 1762:
>
>"""
>The iterative query feature is a normative optional feature
>of the registry. The AdhocQueryRequest and AdhocQueryResponse
>support the ability to iterate over a large result set matching
>a logical query by allowing multiple AdhocQueryRequest requests
>to be submitted such that each query requests a different sliding
>window within the result set. This feature enables the registry to
>handle queries that match a very large result set, in a scalable manner.
>The iterative queries feature is not a true Cursor capability as
>found in databases. The registry is not required to maintain
>transactional consistency or state between iterations of a query.
>Thus it is possible for new objects to be added or existing objects
>to be removed from the complete result set in between iterations.
>As a consequence it is possible to have a result set element be
>skipped or duplicated between iterations. 
>
>Note that while it is not required, it may be possible for
>implementations to be smart and implement a transactionally
>consistent iterative query feature. It is likely that a future
>version of this specification will require a transactionally
>consistent iterative query capability.
>"""
>
>  
>
Received on Friday, 28 May 2004 08:18:00 UTC