Re: N-triples white space question from Sandro Hawke on 2012-05-18 (public-rdf-wg@w3.org from May 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 18 May 2012 09:45:17 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-ID: <1337348717.17747.35.camel@waldron>
On Fri, 2012-05-18 at 14:08 +0100, Andy Seaborne wrote:
> 
> On 18/05/12 13:12, Eric Prud'hommeaux wrote:
> > * Richard Cyganiak<richard@cyganiak.de>  [2012-05-18 12:35+0100]
> >>
> >> On 18 May 2012, at 11:34, Eric Prud'hommeaux wrote:
> >>> Does the existing body of N-Triples permit a grammar with no default whitespace rules?
> >>>
> >>>   triples: triple (LF triple)* LF?
> >>>   triple: subject HWS predicate HWS object '.'
> >>>
> >>> I.e, do all the N-Triples out there look like "<s>  <p>  <o>."?
> >>
> >> This is what N-Triples as currently defined requires. Isn't that sufficient?
> >>
> >> ntripleDoc ::= line*
> >> line  ::= ws* ( comment | triple )? eoln 
> >> triple  ::= subject ws+ predicate ws+ object ws* '.' ws*
> >> ws  ::= space | tab 
> >> eoln  ::= cr | lf | cr lf 
> >
> > I was just interested to see how much your SHOULD:
> > [[
> > * Richard Cyganiak<richard@cyganiak.de>  [2012-05-18 11:06+0100]
> >> I would even go one step further and add some SHOULD-level guidance on where to put what whitespace. Perhaps something like: exactly one space between s and p; exactly one space between p and o; no WS before or after the period; no WS at
> > the start of a line; CR+LF as EOL.
> > ]]
> > could be turned into a MUST.
> 
> I don't think a MUST is a good idea, partially because it's too late, 
> but also despite being a dump format, it's not pure binary.  Blank lines 
> and comments do have a roll here and the CR+LF is a mild inconvenience 
> in some text tools.
> 
> There is variance in IRIs so from that point alone, NT has variations 
> enough to stop blindly processing with line-based tools.  I've seen the 
> :80 thing in messy data.
> 
> processing based on appearance needs an extra step to be safe at scale 
> (i.e. not need checking afterwards).
> 
> What a canonical form is good for is as a target for a simple tools to 
> process and output.  Hopefully, then tool makers will provide it by user 
> demand.

So, no one would be writing a parser for n-triples that ONLY did
canonical n-triples.  (At the point where you're writing something that
can keep a table of b-nodes labels, scanning over multiple spaces
between the subject and the predicate is pretty easy.)    But we'd say
people SHOULD output "canonical" n-triples so that plain-text
RDF-unaware tools like sort and grep would work.    Is that the
proposal?

    -- Sandro


>  Andy
> 
> >
> >
> >> Richard
> >>
> >>
> >>
> >>> I note that Oracle has been vigilent about preserving backwards-compatibility. Souri, do you have a sense of what Oracle has been using?
> >>>
> >>>> I also note that RDF 2004 N-Triples allows comments (only at the start of a line). This makes sense for the use as a test case format, but not much sense for the use as a dump format.
> >>>>
> >>>> Best,
> >>>> Richard
> >>>>
> >>>>
> >>>> [1] http://www.w3.org/TR/rdf-testcases/#ntriples
> >>>>
> >>>>
> >>>>
> >>>> On 18 May 2012, at 10:04, Andy Seaborne wrote:
> >>>>
> >>>>> Gavin, Eric,
> >>>>>
> >>>>> rdf-turtle says:
> >>>>>
> >>>>> [1] ntriplesDoc ::= (triple)? (EOL triple)* (EOL)?
> >>>>> [2] triple ::= subject predicate object '.'
> >>>>> [8] EOL  ::= ([#xD#xA])+
> >>>>>
> >>>>> What are the white space rules?
> >>>>>
> >>>>> Does it inherit white space processing from the rest of Turtle? Comments seem to come from Turtle.
> >>>>>
> >>>>> If it does not inherit white space rules,
> >>>>>    what about horizontal white space inside triples?
> >>>>>
> >>>>> If it does inherit white space rules,
> >>>>>   that includes newlines within triples between S/P or P/O.
> >>>>>
> >>>>> The simplest solution is to add text in section 12.3 to say that horizontal white space outside tokens is discarded (which is different to Turtle).
> >>>>>
> >>>>>  Andy
> >>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> -ericP
> >>>
> >>
> >
> 
>
Received on Friday, 18 May 2012 13:45:35 UTC