[Serial] I18N WG last call comments [14] (qt-2004Feb0362-11) from Henry Zongaro on 2004-08-31 (public-qt-comments@w3.org from August 2004)

From: Henry Zongaro <zongaro@ca.ibm.com>
Date: Tue, 31 Aug 2004 17:46:10 -0400
To: Martin Duerst <duerst@w3.org>
Cc: public-qt-comments@w3.org, w3c-i18n-ig@w3.org
Message-ID: <OF6C71A203.8255F140-ON85256F01.0076B077-85256F01.007793C3@ca.ibm.com>
Martin,

     In [1], you submitted the following comment on the Last Call Working 
Draft of XSLT 2.0 and XQuery 1.0 Serialization on behalf of the I18N 
Working Group:

<<
[14] Section 3, four phases of serialization: Character expansion
   comes before Encoding, but encoding depends on character
   expansion (using numeric character references for characters
   that don't exist in a certain encoding). This has to be
   sorted out very carefully and explained in detail, ideally
   with examples. There's also an interaction between mapping and
   normalization.  If there's a mapping combining grave->&#x300;,
   normalization must be aware that &#x300; is not an ASCII string!
>>

     Thanks to you and the I18N Working Group for this comment.  The XSL 
and XML Query Working Groups discussed the comment, and decided, because 
of the interactions between Unicode normalization and creation of 
character references, to fold together character expansion and Unicode 
normalization.  In addition, the working groups decided to add creation of 
character references to the character expansion phase, because it had not 
been explicitly mentioned as part of that phase.

     Specifically, the working groups decided to replace the second and 
third bullets of Section 4 of Serialization with 
the following text:

<<
2. Character expansion is concerned with the representation of
   characters appearing in text and attribute nodes in the
   instance of the data model. The substitution processes that
   may apply are listed below, in priority order: a character
   that is handled by one process in this list will be
   unaffected by processes appearing later in the list, except
   that a character affected by Unicode normalization may be
   affected by creation of CDATA sections or by character
   escaping

   o URI escaping (in the case of URI-valued attributes in the
     HTML and XHTML output methods), as determined by the
     escape-uri-attributes parameter

   o Character mapping, as determined by the use-character-maps
     parameter.  Text nodes that are children of elements
     specified by the cdata-section-elements parameter are not
     affected by this step. 

   o Unicode Normalization, if requested by the
     normalization-form parameter. Unicode normalization is
     applied to the character stream that results after all
     markup generation and character expansion has taken place.

     For the definitions of the various normalization forms,
     see [Character Model for the World Wide Web 1.0]

     The meanings associated with the possible values of the
     normalization-form parameter are as follows:

     o NFC specifies the serialized result should be in Unicode
       Normalization Form C.

     o NFD specifies the serialized result should be in Unicode
       Normalization Form D.

     o NFKC specifies the serialized result should be in Unicode
       Normalization Form KC.

     o NFKD specifies the serialized result should be in Unicode
       Normalization Form KD.

     o fully-normalized specifies the serialized result should
       be in fully normalized form.

     o none specifies that no Unicode normalization should be
       applied.

     o An implementation-defined value has an implementation-
       defined effect.

   o Creation of CDATA sections, as determined by the
     cdata-section-elements parameter. Note that this is also
     affected by the encoding parameter, in that characters not
     present in the selected encoding cannot be represented in
     a CDATA section.

   o Escaping according to XML or HTML rules of special
     characters and of characters that cannot be represented in
     the selected encoding.  For example replacing < by &lt;.
>>

     The Unicode Normalization phase becomes the third step of character 
expansion.  Character mapping becomes the second step, with the 
clarification that it does not affect elements to which 
cdata-section-elements applies.  This was done to make it clear that any 
characters affected by character mapping are not affected by Unicode 
Normalization.  The lead-in to the bulleted list will be modified so that 
CDATA section creation and escaping still apply to characters affected by 
Unicode Normalization - this is a consequence of trying to fold the two 
together.  Finally, the last bullet will be modified to make it clear that 
not only special characters, but characters that can't be represented in 
the selected encoding are affected by that final step.

     May I ask you to confirm that this response is acceptable to the I18N 
Working Group?

Thanks,

Henry
------------------------------------------------------------------
Henry Zongaro      Xalan development
IBM SWS Toronto Lab   T/L 969-6044;  Phone +1 905 413-6044
mailto:zongaro@ca.ibm.com
Received on Tuesday, 31 August 2004 21:46:41 UTC