Re: ORA-SE-305-E: Phase 2 should mention generation of character references (qt-2004Feb1040-01) from Henry Zongaro on 2004-08-31 (public-qt-comments@w3.org from August 2004)

From: Henry Zongaro <zongaro@ca.ibm.com>
Date: Tue, 31 Aug 2004 16:45:10 -0400
To: Stephen Buxton <Stephen.Buxton@oracle.com>
Cc: public-qt-comments@w3.org
Message-ID: <OFDF07BE77.FDAB64DE-ON85256F01.00708621-85256F01.0071FDE5@ca.ibm.com>
Steve,

     In [1], you submitted the following comment on the Last Call Working 
Draft of XSLT 2.0 and XQuery 1.0 Serialization:

<<
SECTION 3: Serialization parameters

Phase 2, "Character markup", fourth bullet, mentions 
escaping of special characters such as &lt;.  You could 
also mention here the creation of character references 
for characters that are not representable in the encoding.
>>

     Thank you for this comment.  The XSL and XML Query Working Groups 
discussed your comment, and decided, because of the interactions between 
Unicode normalization and creation of character references, to fold 
together character expansion 
and Unicode normalization, and at the same time, add creation of character 
references to the character expansion phase.

     Specifically, the working groups decided to replace the second and 
third bullets of Section 4 of Serialization with 
the following text:

<<
2. Character expansion is concerned with the representation of
   characters appearing in text and attribute nodes in the
   instance of the data model. The substitution processes that
   may apply are listed below, in priority order: a character
   that is handled by one process in this list will be
   unaffected by processes appearing later in the list, except
   that a character affected by Unicode normalization may be
   affected by creation of CDATA sections and by character
   escaping

   o URI escaping (in the case of URI-valued attributes in the
     HTML and XHTML output methods), as determined by the
     escape-uri-attributes parameter

   o Character mapping, as determined by the use-character-maps
     parameter.  Text nodes that are children of elements
     specified by the cdata-section-elements parameter are not
     affected by this step. 

   o Unicode Normalization, if requested by the
     normalization-form parameter. Unicode normalization is
     applied to the character stream that results after all
     markup generation and character expansion has taken place.

     For the definitions of the various normalization forms,
     see [Character Model for the World Wide Web 1.0]

     The meanings associated with the possible values of the
     normalization-form parameter are as follows:

     o NFC specifies the serialized result should be in Unicode
       Normalization Form C.

     o NFD specifies the serialized result should be in Unicode
       Normalization Form D.

     o NFKC specifies the serialized result should be in Unicode
       Normalization Form KC.

     o NFKD specifies the serialized result should be in Unicode
       Normalization Form KD.

     o fully-normalized specifies the serialized result should
       be in fully normalized form.

     o none specifies that no Unicode normalization should be
       applied.

     o An implementation-defined value has an implementation-
       defined effect.

   o Creation of CDATA sections, as determined by the
     cdata-section-elements parameter. Note that this is also
     affected by the encoding parameter, in that characters not
     present in the selected encoding cannot be represented in
     a CDATA section.

   o Escaping according to XML or HTML rules of special
     characters and of characters that cannot be represented in
     the selected encoding.  For example replacing < by &lt;.
>>

     The Unicode Normalization phase becomes the third step of character 
expansion.  Character mapping becomes the second step, with the 
clarification that it does not affect elements to which 
cdata-section-elements applies.  This was done to make it clear that any 
characters affected by character mapping are not affected by Unicode 
Normalization.  The lead-in to the bulleted list will be modified so that 
CDATA section creation and escaping still apply to characters affected by 
Unicode Normalization - this is a consequence of trying to fold the two 
together.  Finally, the last bullet will be modified to make it clear that 
not only special characters, but characters that can't be represented in 
the selected encoding are affected by that final step.

     As a representative of Oracle was present when this decision was 
made, I will assume the response is acceptable to you.

Thanks,

Henry [On behalf of the XSL and XML Query Working Groups]
[1] 
http://lists.w3.org/Archives/Public/public-qt-comments/2004Feb/1040.html
------------------------------------------------------------------
Henry Zongaro      Xalan development
IBM SWS Toronto Lab   T/L 969-6044;  Phone +1 905 413-6044
mailto:zongaro@ca.ibm.com
Received on Tuesday, 31 August 2004 20:45:40 UTC