Re: Streamability II from Tom Adams on 2004-06-16 (public-rdf-dawg@w3.org from April to June 2004)

From: Tom Adams <tom@tucanatech.com>
Date: Wed, 16 Jun 2004 10:04:01 -0400
To: public-rdf-dawg@w3.org
Message-Id: <04BDDA94-BF9E-11D8-8174-000A95C9112A@tucanatech.com>

>> With a streamable protocol the loop above can be executed the first  
>> time as
>> soon as the first result binding (row in the result table) arrives.   
>> This is
>> Jim's example of do something with first 100 results while the server  
>> is
>> still dealing with the next 100 done in the style of iterators.  The  
>> above
>> code does nto require all the results to be in memory at the same  
>> time.
>
> Agreed, this would be familiar for developers who've been dealing SAX
> programming model.
>
>> In Jena the classes are: QueryResults [1] for the iterator and  
>> ResultBinding
>> [2] for each row of the conceptual result table.
>> [1]
>> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/rdql/ 
>> QueryResults.html
>> [2]
>> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/rdql/ 
>> ResultBinding.html
>> See also java.sql.ResultSet.  The API style is to deliver one row of  
>> the
>> result set to the application at a time with  
>> java.sql.ResultSet#next() &
>> java.sql.ResultSet#isLast() Result set can be TYPE_FORWARD_ONLY.
>
> This may be too concrete an example but pls bear with me as I need
> to understand how this would work in practice:
>
> I can imagine applications blocking with a call to next() even
> if isLast() says 'false'. This is not what I want query clients
> to experience (e.g. network problem would hang a program until
> TCP level says 'timeout').
>
> If the call to next() is asyncronous, I guess we would then need
> a good'old select() type of call familiar to C programmers
> who've dealt with file descriptors.
>
> Bottom line question: will the streaming protocol effectively
> require more work from developers? I cannot see how to map
> it to current practice of using ResultSets and alike without
> additional low level IO management?
>
> I am sure if someone with ODBC experience could tell how this
> gray area (for me) was solved I would feel more comfortable.


I don't think this is too hard a problem to crack. As Andy mentions, a  
ResultSet implementation is backed by a model that makes calls to the  
server when results, so for example, results could be paged, and when a  
page of results is iterated over, a new page is fetched. In fact we've  
implemented just such a scheme in TKS/Kowari. There is no additional  
overhead on the part of the client developer.

I see the concept of streaming as being vitally important. When you  
deal with large results, using memory to cache them, simply doesn't  
scale.

I'll see if I can rustle up some more implementation details.

Cheers,
Tom
-- 
Tom Adams           | Tucana Technologies, Inc.
Support Engineer    |   Office: +1 703 871 5313
tom@tucanatech.com  |     Cell: +1 571 594 0847
-----------------------------------------------

Received on Wednesday, 16 June 2004 10:04:04 UTC