From: François Yergeau <francois@yergeau.com>
Date: Thu, 24 Jun 2004 12:04:09 -0400
To: Tim Bray <tbray@textuality.com>
Cc: www-i18n-comments@w3.org
Message-id: <40DAFB79.9090604@yergeau.com>

Hi Tim,

I was charged with contacting you about one of your comments on the 
Character Model.  This particular comment (our number LC031) was about 
section 4.6 Character Escaping. You wrote:

This is incorrect.  Within CDATA sections, &#xd801; is perfectly legal
and just encodes a string of 8 ASCII characters.  Outside of CDATA
sections "&#xd801;" is illegal, but that's an XML thing, not a CDATA
section thing.

The example in question reads:
EXAMPLE: XML defines 'CDATA sections' which allow escaping the
syntax-significance of all characters between the CDATA section 
delimiters. CDATA sections do not allow the expression of 
unrepresentable characters and in fact prevent their expression using 
numeric character references.

We were not sure how to interpret your comment, since 'unrepresentable 
character' in the example doesn't refer to things high
surrogates, but to point 2 of the list just above the examples: "2. 
expressing characters not representable in the character encoding chosen 
for an instance of the language, or".  A high surrogate such as #xd801 
is not a character at all, so it cannot be what 'unrepresentable 
character' refers to.  Instead, an unrepresentable character would be 
for instance a Chinese ideograph when the chosen encoding does not 
contain Chinese ideographs in its repertoire.

We tentatively decided to accept your comment as a request for 
clarification, since it would seem that it came from a misunderstanding, 
requiring Charmod to be clearer in this area.  I'm writing today to ask 
you whether we correctly interpreted your comment, or if there's 
something else we should take into account.


François Yergeau
