Re: N-triples white space question from Andy Seaborne on 2012-05-18 (public-rdf-wg@w3.org from May 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 18 May 2012 14:08:26 +0100
To: public-rdf-wg@w3.org
Message-ID: <4FB649CA.3040403@epimorphics.com>
On 18/05/12 13:12, Eric Prud'hommeaux wrote:
> * Richard Cyganiak<richard@cyganiak.de>  [2012-05-18 12:35+0100]
>>
>> On 18 May 2012, at 11:34, Eric Prud'hommeaux wrote:
>>> Does the existing body of N-Triples permit a grammar with no default whitespace rules?
>>>
>>>   triples: triple (LF triple)* LF?
>>>   triple: subject HWS predicate HWS object '.'
>>>
>>> I.e, do all the N-Triples out there look like "<s>  <p>  <o>."?
>>
>> This is what N-Triples as currently defined requires. Isn't that sufficient?
>>
>> ntripleDoc	::=	line*
>> line		::=	ws* ( comment | triple )? eoln	
>> triple		::=	subject ws+ predicate ws+ object ws* '.' ws*
>> ws		::=	space | tab	
>> eoln		::=	cr | lf | cr lf	
>
> I was just interested to see how much your SHOULD:
> [[
> * Richard Cyganiak<richard@cyganiak.de>  [2012-05-18 11:06+0100]
>> I would even go one step further and add some SHOULD-level guidance on where to put what whitespace. Perhaps something like: exactly one space between s and p; exactly one space between p and o; no WS before or after the period; no WS at
> the start of a line; CR+LF as EOL.
> ]]
> could be turned into a MUST.

I don't think a MUST is a good idea, partially because it's too late, 
but also despite being a dump format, it's not pure binary.  Blank lines 
and comments do have a roll here and the CR+LF is a mild inconvenience 
in some text tools.

There is variance in IRIs so from that point alone, NT has variations 
enough to stop blindly processing with line-based tools.  I've seen the 
:80 thing in messy data.

processing based on appearance needs an extra step to be safe at scale 
(i.e. not need checking afterwards).

What a canonical form is good for is as a target for a simple tools to 
process and output.  Hopefully, then tool makers will provide it by user 
demand.

	Andy

>
>
>> Richard
>>
>>
>>
>>> I note that Oracle has been vigilent about preserving backwards-compatibility. Souri, do you have a sense of what Oracle has been using?
>>>
>>>> I also note that RDF 2004 N-Triples allows comments (only at the start of a line). This makes sense for the use as a test case format, but not much sense for the use as a dump format.
>>>>
>>>> Best,
>>>> Richard
>>>>
>>>>
>>>> [1] http://www.w3.org/TR/rdf-testcases/#ntriples
>>>>
>>>>
>>>>
>>>> On 18 May 2012, at 10:04, Andy Seaborne wrote:
>>>>
>>>>> Gavin, Eric,
>>>>>
>>>>> rdf-turtle says:
>>>>>
>>>>> [1] ntriplesDoc	::= (triple)? (EOL triple)* (EOL)?
>>>>> [2] triple	::= subject predicate object '.'
>>>>> [8] EOL		::= ([#xD#xA])+
>>>>>
>>>>> What are the white space rules?
>>>>>
>>>>> Does it inherit white space processing from the rest of Turtle? Comments seem to come from Turtle.
>>>>>
>>>>> If it does not inherit white space rules,
>>>>>    what about horizontal white space inside triples?
>>>>>
>>>>> If it does inherit white space rules,
>>>>>   that includes newlines within triples between S/P or P/O.
>>>>>
>>>>> The simplest solution is to add text in section 12.3 to say that horizontal white space outside tokens is discarded (which is different to Turtle).
>>>>>
>>>>> 	Andy
>>>>>
>>>>
>>>>
>>>
>>> --
>>> -ericP
>>>
>>
>
Received on Friday, 18 May 2012 13:08:59 UTC