Re: [Turtle] Two formats (was: Re: Turtle, Qurtle, Super-Turtle, N-Triple, N-Quads, Trig - BC and Scope)

On 2011-03-04, at 18:03, Richard Cyganiak wrote:

> On 3 Mar 2011, at 21:56, Steve Harris wrote:
>>> I don't think you can argue that users have one firm expectation for the handling of N-Triples and a different firm expectation for N-Quads.
>> 
>> I really can. The usecases for those file formats are significantly different.
> 
> I dispute that. I believe that both are mostly used for exchanging large RDF dumps (ignoring the use of N-Triples for test cases).

We use N-Triples quite a lot internally, for lots of different things.

Simply the fact that one encodes triples, and one quads means that the use cases are different, we only use N-Quads (or TriG) for bulk dumps.

> At any rate, the use cases for both formats are far from disjoint.

There's some overlap, sure.

> If we had made N-Quads syntactically disjoint from N-Triples, then we'd get the situation where a system that only supports N-Triples rejects an N-Quads file that has has “DEFAULT” at the end of every line.

Good! That means something else than a file full of triples.

If I send LOAD <http://foo.example/file> to a SPARQL Update store that doesn't load into the default graph, but N-Quads with DEFAULT on the end would, I hope. 

>>> This is a concern I share, and a reason why I'm opposed to multigraph/quad support in “small-scale” formats like TriG, Turtle, RDF/XML or RDF/JSON.
>> 
>> I also regard N-Triples as a "small-scale" format.
> 
> Why? Its advantages over Turtle (easy to grep/sed, easy to parse with O(1) memory, easy to merge) seem to be relevant for large files but not for small ones.

It's very cheap to generate, compared to Turtle. Useful if you're doing a lot of small imports, e.g. metadata, and data from web crawls.

If you're generating one Turtle file with 20 triples in, and importing it, the cost isn't significant. But, if you generating 200 files per second, each with 20 triples, scattered across a big cluster, it all adds up.

We do also use SPARQL Update (which is pretty TriG like), but a HTTP PUTable syntax is easier for us to work with when there's only a single graph being updated in one operation.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Saturday, 5 March 2011 23:00:26 UTC