Re: SAX for SPARQL..?

Danny Ayers wrote:
> I just read a blog post [1] from Shelley Powers in which she talks about 
> JSON vs XML and goes into RDF/XML vs Turtle territory a bit.  Seems like 
> a lot of the potential XML tool interop that a 'nice' RDF/XML* might 
> have provided is now available through SPARQL results.
> 
> While commenting over there it occurred to me that one aspect that 
> doesn't seem to be available is a streaming style of access a la SAX. 
> Ok, this is completely off the top of my head, may well be a non-starter 
> - any obvious reasons it couldn't work? If not, has anyone looked 
> into/implemented this? Would say hooking up result iterators to 
> just-in-time XML generation make sense?

Did to me for XML and for JSON results formats.

The writing side is quite natural although ARQ directly outputs XML/JSON, not 
using some library.  The formats are so simple that using a writer library 
turned out to be more effort than just doing it.

For reading, I used StAX for XML.  StAX allows the application to control the 
rate of processing so you get end-to-end streaming.

SAX is less good - it is event driven on the incoming side but the events 
arrive at parser speed, with no control from the application software reading 
the SPARQL results.  To have SAX truely stream would mean putting the 
application results processing inside the SAX event handling and that then 
forces the rest of the APi to be very unnatural.

JSON (and it uses the org.json library) is pull-on-stream, so the SPARQL 
results reader is reading in a streaming style, pulling on result row at a 
time from a JSON input stream.

If the query is "SELECT * { ?s ?p ?o }" then it is streaming triples.

> I'm guessing it should be 
> feasible but only useful with a subset of possible query patterns. Could 
> be sweet for performance/scale though, not to mention queries over 
> Jabber...

Yes - it will depend on the query processing implementation a bit but many 
(most real life) SPARQL queries don't have structures that can't themslves be 
totally streamed (see the Semantics of SPARQL paper for a case that can't).

Streaming query execution is good because it keeps the memory footprint down.

But.  JDBC (typical default setup of the DB server) returns all results before 
letting the client application start looping over the results.  No streaming :-(

 Andy

> 
> Cheers,
> Danny.
> 
> [1] 
> http://burningbird.net/technology/learning-javascript/to-json-or-not-to-json
> 
> (* I suspect a genuinely nice RDF/XML is an unfindable Holy Grail - 
> either you'd have to ditch the XML tree-friendly striping style or the 
> graph-friendly statement style)
> 
> -- 
> 
> http://dannyayers.com

Received on Thursday, 11 January 2007 16:35:19 UTC