Re: Turtle Tuples: Turtle-based query result format

On Wed, 2005-03-23 at 16:57 +0100, Arjohn Kampman wrote:
> Dave Beckett wrote:
> > My comments on the points in the thread.
> > 
> > Less overhead - yeah.  Although Arjohn's later reply says it isn't
> > always lower in speed.
> 
> The performance improvements were a bit disappointing, at least in
> combination with Sesame. In a client-server setting, a Sesame server
> automatically applies gzip-compression to any query results; apparently
> Sesame XML format compresses better than the Turtle Tuples format, as
> the compressed files were comparable in size.
> 
> One factor that influenced the results was the fact that we performed
> tests on meta-data for URLs. Most URLs tend to end in a file name that
> includes a file extension (e.g. "index.html"). Because of the dot for
> the file extension, these URLs had to be written as full URIs in the
> result format as Turtle doesn't alllow dots in qnames.

That's actually something I've been actioned to check up on for the
DAWG, and we may change it in SPARQL with the n3/turtle triples and I'd
want to change Turtle to match.  I'm currently favouring it once I check
how it matches XML's name terms and check how it clashes with '.' as a
sentence-terminator in turtle. e.g:

(if . is allowed in names)
  ex:a ex:b ex:c. .

Turtle actually allows omitted spaces after the object, so this is
allowed:
  ex:a ex:b ex:c.

but then that get's you into trouble if you wanted to end it with '.':
  ex:a ex:b ex:c..
or
  ex:a ex:b ex:c. .

which could be worked-around with some words about when whitespace is or
can be ignored.

> As a result of the disappointing performance results, I decided to
> implement a binary format. This binary format gave much better results,
> giving roughly a factor 2 in increased performance for our application
> (a combination of Aduna Spectacle and Aduna Metadata Server). This
> format is documented at [1] if you're interested.

Insert the binary XML discusion here :)

> [...]
> > Easy to write - although this may be true for those familiar with
> > N3/Turtle style languages, this is a query result format and that's
> > either being written by query processors (so easy to write isn't
> > critical) or by query engine developers and people working on the SPARQL
> > language and tests - a small group!
> 
> The point that was being made is that the format would be easier to
> write by these query processors. An XML format requires one to specify
> any namespace prefixes at the start of the document, which makes it
> harder to write in a streaming fashion.

I don't understand that, your turtle tuples has @prefix before the
result body declaring namespaces.  Same issue with streaming as the XML
VBR.  Or I guess you could add them later but you didn't choose to show
them in the example.

> [...]
> > I'm not sure this is something I'd prioritise now over, say, getting the
> > XML format more polished after feedback.
> 
> I agree, but please 'fix' this XML format (I don't like the "variables
> as tags" thing very much, as I pointed out in an earlier mail to this
> list:-) ).

That's under discussion in the DAWG already, as we are considering
motivations to switch to a form where variable names are attribute
values/element content; and also possibly (allow) adding xsi:type.

> Cheers,
> 
> Arjohn
> 
> [1] 
> http://www.openrdf.org/doc/api/sesame/org/openrdf/sesame/query/BinaryTableResultConstants.html
> 

Dave

Received on Wednesday, 23 March 2005 16:20:37 UTC