Re: N-triples white space question

On 18/05/12 13:12, Eric Prud'hommeaux wrote:
> * Richard Cyganiak<>  [2012-05-18 12:35+0100]
>> On 18 May 2012, at 11:34, Eric Prud'hommeaux wrote:
>>> Does the existing body of N-Triples permit a grammar with no default whitespace rules?
>>>   triples: triple (LF triple)* LF?
>>>   triple: subject HWS predicate HWS object '.'
>>> I.e, do all the N-Triples out there look like "<s>  <p>  <o>."?
>> This is what N-Triples as currently defined requires. Isn't that sufficient?
>> ntripleDoc	::=	line*
>> line		::=	ws* ( comment | triple )? eoln	
>> triple		::=	subject ws+ predicate ws+ object ws* '.' ws*
>> ws		::=	space | tab	
>> eoln		::=	cr | lf | cr lf	
> I was just interested to see how much your SHOULD:
> [[
> * Richard Cyganiak<>  [2012-05-18 11:06+0100]
>> I would even go one step further and add some SHOULD-level guidance on where to put what whitespace. Perhaps something like: exactly one space between s and p; exactly one space between p and o; no WS before or after the period; no WS at
> the start of a line; CR+LF as EOL.
> ]]
> could be turned into a MUST.

I don't think a MUST is a good idea, partially because it's too late, 
but also despite being a dump format, it's not pure binary.  Blank lines 
and comments do have a roll here and the CR+LF is a mild inconvenience 
in some text tools.

There is variance in IRIs so from that point alone, NT has variations 
enough to stop blindly processing with line-based tools.  I've seen the 
:80 thing in messy data.

processing based on appearance needs an extra step to be safe at scale 
(i.e. not need checking afterwards).

What a canonical form is good for is as a target for a simple tools to 
process and output.  Hopefully, then tool makers will provide it by user 


>> Richard
>>> I note that Oracle has been vigilent about preserving backwards-compatibility. Souri, do you have a sense of what Oracle has been using?
>>>> I also note that RDF 2004 N-Triples allows comments (only at the start of a line). This makes sense for the use as a test case format, but not much sense for the use as a dump format.
>>>> Best,
>>>> Richard
>>>> [1]
>>>> On 18 May 2012, at 10:04, Andy Seaborne wrote:
>>>>> Gavin, Eric,
>>>>> rdf-turtle says:
>>>>> [1] ntriplesDoc	::= (triple)? (EOL triple)* (EOL)?
>>>>> [2] triple	::= subject predicate object '.'
>>>>> [8] EOL		::= ([#xD#xA])+
>>>>> What are the white space rules?
>>>>> Does it inherit white space processing from the rest of Turtle? Comments seem to come from Turtle.
>>>>> If it does not inherit white space rules,
>>>>>    what about horizontal white space inside triples?
>>>>> If it does inherit white space rules,
>>>>>   that includes newlines within triples between S/P or P/O.
>>>>> The simplest solution is to add text in section 12.3 to say that horizontal white space outside tokens is discarded (which is different to Turtle).
>>>>> 	Andy
>>> --
>>> -ericP

Received on Friday, 18 May 2012 13:08:59 UTC