N-triples white space question

On Fri, May 18, 2012 at 4:32 AM, Richard Cyganiak <richard@cyganiak.de> wrote:
> On 18 May 2012, at 11:27, Andy Seaborne wrote:
>> Maybe we could define a canonical form of N-triples:
>
> +1, this would be very useful.

There does seem be a reasonable amount of need for this. For example
test cases ;) ... gee you'd never have guessed that N-Triples started
as a Test Case language.

>
>> . No comments.
>> . No blank lines.
>> . CR+LF
>> . Single space between S/P, P/O.
>>    (a raw tab is also good - it can't appear in a valid literal)

The No comments invalidates all existing RDF Test Cases format
documents which all contain comments. Also the RDF Test Cases
documents have missing EOLs on the last line of the document.

>
> My subjective impression is that single space is very common in existing N-Triples files. The more the canonical form resembles common practice, the better.
>
>> . No use of \u or \U
>
> +1! Very important. (Although common practice at the moment would dictate: “randomly fuck up Unicode characters”)

In that case we should likely also require Unicode normalization NFC.

>
>> . Resolved IRIs
>>    avoid <http://example/a/./b/../c> or <http://example.org:80/a>
>
> The formal way to state this is: “Only IRIs that are normalized according to Section 5 of [IRI].” A link to this Note in RDF Concepts would help to explain what this means:
> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#note-iri-interop
>
>> . Last line has a CR+LF

RDF Test Cases documents would not meet this requirement.

>
> I'd add:
>
>  • No additional HWS before or after CR+LF
>  • No WS between O and triple-ending Period (although “single space” might be closer to current common practice and would work equally well; it's just ugly to my eyes)
>
> Richard

Received on Friday, 18 May 2012 18:28:34 UTC