W3C home > Mailing lists > Public > www-i18n-comments@w3.org > October 2009

Clarification of CharMod C045

From: Doug Schepers <schepers@w3.org>
Date: Thu, 29 Oct 2009 16:59:04 -0400
Message-ID: <4AEA0218.6090206@w3.org>
To: www-i18n-comments@w3.org
Hi, Folks-

While reviewing DOM3 Events, Richard Ishida pointed out that the use of 
surrogate pairs in escaped character strings is frowned upon, citing 
C045 [1]:

C045  [S]  Whenever specifications define character escapes that allow 
the representation of characters using a number, the number MUST 
represent the Unicode code point of the character and SHOULD be in 
hexadecimal notation.

A superficial reading of that point doesn't make a clear distinction 
between surrogate pairs and Unicode code points, since surrogate pairs 
are Unicode code points as well.

His explanation was that the surrogate code points are not the code 
point of the character, but rather they are codepoints of two surrogate 
characters; the codepoint of the character is only and always a single 

While I now understand and agree with his point, I think a clarifying 
errata might benefit people like me who want to be good citizens but 
might not get the implications immediately.

[1] http://www.w3.org/TR/charmod/#C045

-Doug Schepers
W3C Team Contact, SVG and WebApps WGs
Received on Thursday, 29 October 2009 20:59:06 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:20:16 UTC