- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 18 May 2012 09:45:17 -0400
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
On Fri, 2012-05-18 at 14:08 +0100, Andy Seaborne wrote:
>
> On 18/05/12 13:12, Eric Prud'hommeaux wrote:
> > * Richard Cyganiak<richard@cyganiak.de> [2012-05-18 12:35+0100]
> >>
> >> On 18 May 2012, at 11:34, Eric Prud'hommeaux wrote:
> >>> Does the existing body of N-Triples permit a grammar with no default whitespace rules?
> >>>
> >>> triples: triple (LF triple)* LF?
> >>> triple: subject HWS predicate HWS object '.'
> >>>
> >>> I.e, do all the N-Triples out there look like "<s> <p> <o>."?
> >>
> >> This is what N-Triples as currently defined requires. Isn't that sufficient?
> >>
> >> ntripleDoc ::= line*
> >> line ::= ws* ( comment | triple )? eoln
> >> triple ::= subject ws+ predicate ws+ object ws* '.' ws*
> >> ws ::= space | tab
> >> eoln ::= cr | lf | cr lf
> >
> > I was just interested to see how much your SHOULD:
> > [[
> > * Richard Cyganiak<richard@cyganiak.de> [2012-05-18 11:06+0100]
> >> I would even go one step further and add some SHOULD-level guidance on where to put what whitespace. Perhaps something like: exactly one space between s and p; exactly one space between p and o; no WS before or after the period; no WS at
> > the start of a line; CR+LF as EOL.
> > ]]
> > could be turned into a MUST.
>
> I don't think a MUST is a good idea, partially because it's too late,
> but also despite being a dump format, it's not pure binary. Blank lines
> and comments do have a roll here and the CR+LF is a mild inconvenience
> in some text tools.
>
> There is variance in IRIs so from that point alone, NT has variations
> enough to stop blindly processing with line-based tools. I've seen the
> :80 thing in messy data.
>
> processing based on appearance needs an extra step to be safe at scale
> (i.e. not need checking afterwards).
>
> What a canonical form is good for is as a target for a simple tools to
> process and output. Hopefully, then tool makers will provide it by user
> demand.
So, no one would be writing a parser for n-triples that ONLY did
canonical n-triples. (At the point where you're writing something that
can keep a table of b-nodes labels, scanning over multiple spaces
between the subject and the predicate is pretty easy.) But we'd say
people SHOULD output "canonical" n-triples so that plain-text
RDF-unaware tools like sort and grep would work. Is that the
proposal?
-- Sandro
> Andy
>
> >
> >
> >> Richard
> >>
> >>
> >>
> >>> I note that Oracle has been vigilent about preserving backwards-compatibility. Souri, do you have a sense of what Oracle has been using?
> >>>
> >>>> I also note that RDF 2004 N-Triples allows comments (only at the start of a line). This makes sense for the use as a test case format, but not much sense for the use as a dump format.
> >>>>
> >>>> Best,
> >>>> Richard
> >>>>
> >>>>
> >>>> [1] http://www.w3.org/TR/rdf-testcases/#ntriples
> >>>>
> >>>>
> >>>>
> >>>> On 18 May 2012, at 10:04, Andy Seaborne wrote:
> >>>>
> >>>>> Gavin, Eric,
> >>>>>
> >>>>> rdf-turtle says:
> >>>>>
> >>>>> [1] ntriplesDoc ::= (triple)? (EOL triple)* (EOL)?
> >>>>> [2] triple ::= subject predicate object '.'
> >>>>> [8] EOL ::= ([#xD#xA])+
> >>>>>
> >>>>> What are the white space rules?
> >>>>>
> >>>>> Does it inherit white space processing from the rest of Turtle? Comments seem to come from Turtle.
> >>>>>
> >>>>> If it does not inherit white space rules,
> >>>>> what about horizontal white space inside triples?
> >>>>>
> >>>>> If it does inherit white space rules,
> >>>>> that includes newlines within triples between S/P or P/O.
> >>>>>
> >>>>> The simplest solution is to add text in section 12.3 to say that horizontal white space outside tokens is discarded (which is different to Turtle).
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> -ericP
> >>>
> >>
> >
>
>
Received on Friday, 18 May 2012 13:45:35 UTC