- From: Steve Harris <steve.harris@garlik.com>
- Date: Thu, 3 Mar 2011 07:51:17 +0000
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Sandro Hawke <sandro@w3.org>, nathan@webr3.org, RDF-WG <public-rdf-wg@w3.org>
On 2011-03-02, at 22:13, Richard Cyganiak wrote: > On 2 Mar 2011, at 19:32, Sandro Hawke wrote: >> 2. The first, our standard version of Turtle, should be very >> conservative, inside the space of nearly all existing turtle documents >> and software. > > +1 > >> 3. We should have a different syntax, with a different mime-type, for >> handling [GRAPHS] in a Turtle-like language. >> >> If that's true, the next big issue is whether this second syntax is (as >> Ivan proposed) just Turtle plus the minimum needed to handle extra >> graphs (TriG?), or whether (since we don't have nearly as much BC to >> worry about) we should take the opportunity to add some extra stuff >> here. > > Adding extra stuff? I'd actually propose the opposite: Let's throw some stuff out from the [GRAPHS] format. > > At the moment, I see multi-graph formats used mainly to exchange dumps between SPARQL stores. Hence I see this as the main use case to address. > > We've learned from N-Triples that line-based formats are great for exchanging dumps. Agreed. > So, let's take N-Triples and add an optional 4th element to deal with [GRAPHS]. A la N-Quads [1], but being explicit about what the 4th element is. Also add some other good bits along the lines Andy suggested elsewhere (UTF-8, base URI, proper media type). And declare victory. Yes, but lets make that two formats, not one. I would prefer the N-Quads format and media type to mandate 4 columns, to minimise the potential surprise once you start parsing a "N-Triples" file. For one thing, some triplestores have different default behaviours when parsing triples formats than quads formats. In 4store for e.g. if you import <file:triples.nt>, by default it will remove any existing triples in <file:triples.nt> before inserting the "new" ones — this appears to match user expectations. However, if you import <file:quads.nq> there is no real point in clearing out the <file:quads.nt> graph, as there's not often any data in the base URI of the file in an N-quads file, and users don't seem to want you to go round clearing out any graph you find mentioned in the quads dump format before inserting — it causes weird import time behaviour, and unexpected consequences. For example if I have a N-Quads file like: <http://example.com/a> <http://example.com/p> <http://example.com/b> <http://example.com/G1> . ... <http://example.com/G1> <http://example.com/contains> <http://example.com/a> <http://example.com/metadata> . ... It may well be surprising that importing this will will wipe the <http://example.com/metadata> graph, which might be used by multiple N-Quads dump files. The user has no practical way to know if any / what graphs will be affected without pre-parsing the N-Quads dump file, which can be impractical for very large files. There's also the question of what to do if you find a N-Triples file in the wild, say as part of a web crawl. Currently it's safe to import any N-Triples file, and it will only affect triples within the graph of the file itself, but someone could deliberately create malicious N-Quads files designed to add data to well known graph URIs, or to deliberately corrupt provenance data in related graphs: <http://example.com/a> <http://example.com/p> <http://example.com/b> <http://example.com/G1> . <http://example.com/G1> <http://mystore.example/trustLevel> "1.0"^^<http://www.w3.org/2001/XMLSchema#decimal> <http://example.com/G1#provenance> . <http://example.com/G1> <http://purl.org/dc/terms/date> "1970-02-23T00:00:00Z" <http://example.com/G1#provenance> . and so on Consequently there are several cases where the user would like to have different behaviours depending on whether the file you're parsing has 3 or 4 columns, so lets make it easy to find out without pre-parsing the whole file. - Steve -- Steve Harris, CTO, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Thursday, 3 March 2011 07:51:52 UTC