- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Wed, 16 Jun 2004 22:42:14 +0100
- To: "'Janne Saarela'" <janne.saarela@profium.com>
- Cc: "'RDF Data Access Working Group'" <public-rdf-dawg@w3.org>
-------- Original Message -------- > From: public-rdf-dawg-request@w3.org <> > Date: 16 June 2004 13:45 > > > With a streamable protocol the loop above can be executed the first > > time as soon as the first result binding (row in the result table) > > arrives. This is Jim's example of do something with first 100 results > > while the server is still dealing with the next 100 done in the style > > of iterators. The above code does nto require all the results to be > > in memory at the same time. > > Agreed, this would be familiar for developers who've been > dealing SAX programming model. > > > In Jena the classes are: QueryResults [1] for the iterator and > > ResultBinding [2] for each row of the conceptual result table. > > > > [1] > > > http://jena.sourceforge.net/javadoc/com/hp/hpl> /jena/rdql/QueryResults. > > html > > [2] > > > http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/rdql/Resul > tBinding.html > > See also java.sql.ResultSet. The API style is to deliver one row of > the result set to the application at a time with > java.sql.ResultSet#next() & > java.sql.ResultSet#isLast() Result set can be TYPE_FORWARD_ONLY. > > This may be too concrete an example but pls bear with me as I need to > understand how this would work in practice: > > I can imagine applications blocking with a call to next() even if > isLast() says 'false'. This is not what I want query clients to > experience (e.g. network problem would hang a program until TCP level > says 'timeout'). A blocking .next() isn't a necessary design but it does make the application easier for the common design pattern of looping over results. Select (the system call) style does make for complicated programming but allows a singlethreaded app to multitask. We could have had a .isMoreReady() call to allow a guard on a blocking .next(); .next() could be non-blockign returnign a "not ready" indicator. This is select[2] style. Jena's query system is multithreaded because it is easier that way: the application thread loops, pulling result rows out of a bounded buffer. Blocking occurs if the buffer is exhausted; there is an explicit end-of-results token. Being Java, therading comes for little work. The query engine puts results into the bounded buffer as it generates them; the application pulls them out. It could provide a peek() for results (test for blocking) but doesn't. [Aside: I chose not to implement a JDBC interface because I couldn't see how to provide the full interface, so an RDF interface would not be plug compatible.] > This is not what I want query clients to > experience (e.g. network problem would hang a program until TCP level > says 'timeout'). At some level, if the data isn't available, there are choices to be made. Either wait until it all arrives first or provide some kind of per-row interface. > > If the call to next() is asyncronous, I guess we would then need a > good'old select() type of call familiar to C programmers who've dealt > with file descriptors. > > Bottom line question: will the streaming protocol effectively require > more work from developers? No - its not more work. An API design has choices but there are several well-known design patterns here. My experience with a multithreaded implementation was that it was easier. No complex, application level select() calls and deciding what to do if select() says "nothing there". However, my background includes multithread systems and languages. > I cannot see how to map it to current practice > of using ResultSets and alike without additional low level IO management? The TCP stack already has receive-side byte buffering so if there is about the size of a result row or larger, the client is already buffering the next result or several anyway. They need parsing but either a parse-on-demand style (select syscall, blocking .next(), .next synchronously calls the socket to get data and deserialize it) or results parser acting synchronously and blocking on the TCP socket, asynchronously with the application, placing results in (e.g.) a bounded buffer works. In Jena, results are streamed with iterators. If it were not for the JDBC limitations we have encountered (in the general case - we need per JDBC driver code to compensate), we would stream from DB server through a network JDBC connection to application, with buffering to smooth burst behaviour. As JDBC is primarily for a closer-than-web coupling of client and server, this is good. When querying an in-memory RDF graph, the memory overhead is around 20 statements (buffering) and it goes faster on multi-CPU systems. One other design style is callbacks: instead of a result loop, the application hands in a callback function which is called on each result row. This is more natural on a limited, single threaded system but is more application-complex than a loop (unless your programming language has continuations). I'm sure there are other patterns, and significant variations on the three (select/.next block, multithreaded, calback) I have tounched on here. Andy > > I am sure if someone with ODBC experience could tell how this gray area > (for me) was solved I would feel more comfortable. > > Janne
Received on Wednesday, 16 June 2004 17:42:47 UTC