Re: Clarification of CharMod C045

Hello Doug,

Thanks for your comment.

On 2009/10/30 5:59, Doug Schepers wrote:
> Hi, Folks-
>
> While reviewing DOM3 Events, Richard Ishida pointed out that the use of
> surrogate pairs in escaped character strings is frowned upon, citing
> C045 [1]:
>
> [[
> C045 [S] Whenever specifications define character escapes that allow the
> representation of characters using a number, the number MUST represent
> the Unicode code point of the character and SHOULD be in hexadecimal
> notation.
> ]]
>
> A superficial reading of that point doesn't make a clear distinction
> between surrogate pairs and Unicode code points, since surrogate pairs
> are Unicode code points as well.

Yes, surrogates are code points as well, but they are not characters. 
Therefore, as far as I understand, "MUST represent the Unicode code 
point of the *character*" (emphasis added) makes it clear that surrogate 
code points (whether in pairs or not) are not allowed.

> His explanation was that the surrogate code points are not the code
> point of the character, but rather they are codepoints of two surrogate
> characters; the codepoint of the character is only and always a single
> number.

Actually, there's no such thing as a "surrogate character". Surrogates 
don't have character names, they don't have representative glyphs, nor 
do they have anything else that characters typically have. A good place 
to understand this is Table 2-3 on page 27 of Unicode Version 5.

> While I now understand and agree with his point, I think a clarifying
> errata might benefit people like me who want to be good citizens but
> might not get the implications immediately.

Can you propose actual text?

Regards,   Martin.


> [1] http://www.w3.org/TR/charmod/#C045
>
> Regards-
> -Doug Schepers
> W3C Team Contact, SVG and WebApps WGs
>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Friday, 30 October 2009 02:20:31 UTC