Re: Escaped characters in RDF-1.1 N-Triples literals for Canonical documents

On 10/12/13 21:01, David Booth wrote:
> Hi Andy,
>
> On 12/07/2013 11:27 AM, Andy Seaborne wrote:
>> On 06/12/13 21:32, David Booth wrote:
>>>>
>>>> * Within STRING_LITERAL_QUOTE, only characters not allowed directly in
>>>> STRING_LITERAL_QUOTE (U+0022, U+005C, U+000A, U+000D) should use ECHAR.
>>>> For all other characters, ECHAR MUST NOT be used.
>>>> """
>>>
>>> Sorry to bother you again about this, but the phrase "should use ECHAR"
>>> does not seem like the right conformance phrase to use for *canonical*
>>> N-Triples.
>>
>> David,
>>
>> This is not a rule special to canonical N-Triples (CNT) - it's true of
>> N-Triples generally.  Only the MUST NOT is specific to this section.
>>
>> The language grammar says:
>>
>> [9]     STRING_LITERAL_QUOTE     ::=
>>       '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
>>
>> So you have to escape any of [^#x22#x5C#xA#xD] in N-Triples.  The list
>> already says
>>
>> * Characters MUST NOT be represented by UCHAR.
>
> I'm not objecting to the syntactic rule.  It is just the phrasing of the
> prose that seems awkward to me, because of the word "should".  The word
> "should" is normally used as a 2119 conformance term, written in upper
> case, with a meaning that is NOT an absolute requirement.  However, in
> this case the use of ECHAR *is* an absolute requirement for those
> characters that are not allowed directly in STRING_LITERAL_QUOTE.  To
> avoid confusion I think it is best to avoid using the word "should" in a
> non-2119 sense.
>
> How about the following phrasing:
> [[
> Within STRING_LITERAL_QUOTE, ECHAR MUST NOT be used for characters that
> are allowed directly in STRING_LITERAL_QUOTE.  In other words, within
> STRING_LITERAL_QUOTE, the characters (U+0022, U+005C, U+000A, U+000D)
> MUST use ECHAR, and all other characters MUST NOT use ECHAR.
> ]]
>

It's using RFC-2119 language (in the "In other words ... MUST use 
ECHAR...") about something that is not specific to CNT, which is what I 
am trying to avoid.

How about:
[[
Within STRING_LITERAL_QUOTE, only the characters (U+0022, U+005C, 
U+000A, U+000D) are encoded using ECHAR.  ECHAR MUST NOT be used for 
characters that are allowed directly in STRING_LITERAL_QUOTE.
]]

> Thanks,
> David

 Andy

Received on Wednesday, 11 December 2013 18:37:35 UTC