Re: Escaped characters in RDF-1.1 N-Triples literals for Canonical documents

Hi Andy,

On 12/07/2013 11:27 AM, Andy Seaborne wrote:
> On 06/12/13 21:32, David Booth wrote:
>>>
>>> * Within STRING_LITERAL_QUOTE, only characters not allowed directly in
>>> STRING_LITERAL_QUOTE (U+0022, U+005C, U+000A, U+000D) should use ECHAR.
>>> For all other characters, ECHAR MUST NOT be used.
>>> """
>>
>> Sorry to bother you again about this, but the phrase "should use ECHAR"
>> does not seem like the right conformance phrase to use for *canonical*
>> N-Triples.
>
> David,
>
> This is not a rule special to canonical N-Triples (CNT) - it's true of
> N-Triples generally.  Only the MUST NOT is specific to this section.
>
> The language grammar says:
>
> [9]     STRING_LITERAL_QUOTE     ::=
>       '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
>
> So you have to escape any of [^#x22#x5C#xA#xD] in N-Triples.  The list
> already says
>
> * Characters MUST NOT be represented by UCHAR.

I'm not objecting to the syntactic rule.  It is just the phrasing of the 
prose that seems awkward to me, because of the word "should".  The word 
"should" is normally used as a 2119 conformance term, written in upper 
case, with a meaning that is NOT an absolute requirement.  However, in 
this case the use of ECHAR *is* an absolute requirement for those 
characters that are not allowed directly in STRING_LITERAL_QUOTE.  To 
avoid confusion I think it is best to avoid using the word "should" in a 
non-2119 sense.

How about the following phrasing:
[[
Within STRING_LITERAL_QUOTE, ECHAR MUST NOT be used for characters that 
are allowed directly in STRING_LITERAL_QUOTE.  In other words, within 
STRING_LITERAL_QUOTE, the characters (U+0022, U+005C, U+000A, U+000D) 
MUST use ECHAR, and all other characters MUST NOT use ECHAR.
]]

Thanks,
David

Received on Tuesday, 10 December 2013 21:01:52 UTC