- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 18 May 2012 09:45:17 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
On Fri, 2012-05-18 at 14:08 +0100, Andy Seaborne wrote: > > On 18/05/12 13:12, Eric Prud'hommeaux wrote: > > * Richard Cyganiak<richard@cyganiak.de> [2012-05-18 12:35+0100] > >> > >> On 18 May 2012, at 11:34, Eric Prud'hommeaux wrote: > >>> Does the existing body of N-Triples permit a grammar with no default whitespace rules? > >>> > >>> triples: triple (LF triple)* LF? > >>> triple: subject HWS predicate HWS object '.' > >>> > >>> I.e, do all the N-Triples out there look like "<s> <p> <o>."? > >> > >> This is what N-Triples as currently defined requires. Isn't that sufficient? > >> > >> ntripleDoc ::= line* > >> line ::= ws* ( comment | triple )? eoln > >> triple ::= subject ws+ predicate ws+ object ws* '.' ws* > >> ws ::= space | tab > >> eoln ::= cr | lf | cr lf > > > > I was just interested to see how much your SHOULD: > > [[ > > * Richard Cyganiak<richard@cyganiak.de> [2012-05-18 11:06+0100] > >> I would even go one step further and add some SHOULD-level guidance on where to put what whitespace. Perhaps something like: exactly one space between s and p; exactly one space between p and o; no WS before or after the period; no WS at > > the start of a line; CR+LF as EOL. > > ]] > > could be turned into a MUST. > > I don't think a MUST is a good idea, partially because it's too late, > but also despite being a dump format, it's not pure binary. Blank lines > and comments do have a roll here and the CR+LF is a mild inconvenience > in some text tools. > > There is variance in IRIs so from that point alone, NT has variations > enough to stop blindly processing with line-based tools. I've seen the > :80 thing in messy data. > > processing based on appearance needs an extra step to be safe at scale > (i.e. not need checking afterwards). > > What a canonical form is good for is as a target for a simple tools to > process and output. Hopefully, then tool makers will provide it by user > demand. So, no one would be writing a parser for n-triples that ONLY did canonical n-triples. (At the point where you're writing something that can keep a table of b-nodes labels, scanning over multiple spaces between the subject and the predicate is pretty easy.) But we'd say people SHOULD output "canonical" n-triples so that plain-text RDF-unaware tools like sort and grep would work. Is that the proposal? -- Sandro > Andy > > > > > > >> Richard > >> > >> > >> > >>> I note that Oracle has been vigilent about preserving backwards-compatibility. Souri, do you have a sense of what Oracle has been using? > >>> > >>>> I also note that RDF 2004 N-Triples allows comments (only at the start of a line). This makes sense for the use as a test case format, but not much sense for the use as a dump format. > >>>> > >>>> Best, > >>>> Richard > >>>> > >>>> > >>>> [1] http://www.w3.org/TR/rdf-testcases/#ntriples > >>>> > >>>> > >>>> > >>>> On 18 May 2012, at 10:04, Andy Seaborne wrote: > >>>> > >>>>> Gavin, Eric, > >>>>> > >>>>> rdf-turtle says: > >>>>> > >>>>> [1] ntriplesDoc ::= (triple)? (EOL triple)* (EOL)? > >>>>> [2] triple ::= subject predicate object '.' > >>>>> [8] EOL ::= ([#xD#xA])+ > >>>>> > >>>>> What are the white space rules? > >>>>> > >>>>> Does it inherit white space processing from the rest of Turtle? Comments seem to come from Turtle. > >>>>> > >>>>> If it does not inherit white space rules, > >>>>> what about horizontal white space inside triples? > >>>>> > >>>>> If it does inherit white space rules, > >>>>> that includes newlines within triples between S/P or P/O. > >>>>> > >>>>> The simplest solution is to add text in section 12.3 to say that horizontal white space outside tokens is discarded (which is different to Turtle). > >>>>> > >>>>> Andy > >>>>> > >>>> > >>>> > >>> > >>> -- > >>> -ericP > >>> > >> > > > >
Received on Friday, 18 May 2012 13:45:35 UTC