- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 03 Mar 2011 11:18:33 +0000
- To: Steve Harris <steve.harris@garlik.com>
- CC: Richard Cyganiak <richard@cyganiak.de>, Sandro Hawke <sandro@w3.org>, nathan@webr3.org, RDF-WG <public-rdf-wg@w3.org>
On 03/03/11 07:51, Steve Harris wrote: > On 2011-03-02, at 22:13, Richard Cyganiak wrote: > >> On 2 Mar 2011, at 19:32, Sandro Hawke wrote: >>> 2. The first, our standard version of Turtle, should be very >>> conservative, inside the space of nearly all existing turtle >>> documents and software. >> >> +1 >> >>> 3. We should have a different syntax, with a different >>> mime-type, for handling [GRAPHS] in a Turtle-like language. >>> >>> If that's true, the next big issue is whether this second syntax >>> is (as Ivan proposed) just Turtle plus the minimum needed to >>> handle extra graphs (TriG?), or whether (since we don't have >>> nearly as much BC to worry about) we should take the opportunity >>> to add some extra stuff here. >> >> Adding extra stuff? I'd actually propose the opposite: Let's throw >> some stuff out from the [GRAPHS] format. >> >> At the moment, I see multi-graph formats used mainly to exchange >> dumps between SPARQL stores. Hence I see this as the main use case >> to address. >> >> We've learned from N-Triples that line-based formats are great for >> exchanging dumps. > > Agreed. > >> So, let's take N-Triples and add an optional 4th element to deal >> with [GRAPHS]. A la N-Quads [1], but being explicit about what the >> 4th element is. Also add some other good bits along the lines Andy >> suggested elsewhere (UTF-8, base URI, proper media type). And >> declare victory. > > Yes, but lets make that two formats, not one. I would prefer the > N-Quads format and media type to mandate 4 columns, to minimise the > potential surprise once you start parsing a "N-Triples" file. > > For one thing, some triplestores have different default behaviours > when parsing triples formats than quads formats. > > In 4store for e.g. if you import<file:triples.nt>, by default it will > remove any existing triples in<file:triples.nt> before inserting the > "new" ones — this appears to match user expectations. However, if you > import<file:quads.nq> there is no real point in clearing out > the<file:quads.nt> graph, as there's not often any data in the base > URI of the file in an N-quads file, and users don't seem to want you > to go round clearing out any graph you find mentioned in the quads > dump format before inserting — it causes weird import time behaviour, > and unexpected consequences. > > For example if I have a N-Quads file like: > > <http://example.com/a> <http://example.com/p> > <http://example.com/b> <http://example.com/G1> . ... > <http://example.com/G1> <http://example.com/contains> > <http://example.com/a> <http://example.com/metadata> . ... > > It may well be surprising that importing this will will wipe > the<http://example.com/metadata> graph, which might be used by > multiple N-Quads dump files. > > The user has no practical way to know if any / what graphs will be > affected without pre-parsing the N-Quads dump file, which can be > impractical for very large files. > > There's also the question of what to do if you find a N-Triples file > in the wild, say as part of a web crawl. Currently it's safe to > import any N-Triples file, and it will only affect triples within the > graph of the file itself, but someone could deliberately create > malicious N-Quads files designed to add data to well known graph > URIs, or to deliberately corrupt provenance data in related graphs: > > <http://example.com/a> <http://example.com/p> > <http://example.com/b> <http://example.com/G1> . > <http://example.com/G1> <http://mystore.example/trustLevel> > "1.0"^^<http://www.w3.org/2001/XMLSchema#decimal> > <http://example.com/G1#provenance> . <http://example.com/G1> > <http://purl.org/dc/terms/date> > "1970-02-23T00:00:00Z"<http://example.com/G1#provenance> . > > and so on > > Consequently there are several cases where the user would like to > have different behaviours depending on whether the file you're > parsing has 3 or 4 columns, so lets make it easy to find out without > pre-parsing the whole file. Does N-quads serializes a dataset (default graph and named graphs) N-Triples serializes a graph work for you? It means that in N-Quads, the absence of the 4th column means default graph. I know 4Store does not have an independent default graph but some other systems do. N-Quads should capture the generality of an RDF dataset or graph store. Andy
Received on Thursday, 3 March 2011 11:19:15 UTC