- From: Henry Zongaro <zongaro@ca.ibm.com>
- Date: Tue, 31 Aug 2004 17:46:10 -0400
- To: Martin Duerst <duerst@w3.org>
- Cc: public-qt-comments@w3.org, w3c-i18n-ig@w3.org
Martin, In [1], you submitted the following comment on the Last Call Working Draft of XSLT 2.0 and XQuery 1.0 Serialization on behalf of the I18N Working Group: << [14] Section 3, four phases of serialization: Character expansion comes before Encoding, but encoding depends on character expansion (using numeric character references for characters that don't exist in a certain encoding). This has to be sorted out very carefully and explained in detail, ideally with examples. There's also an interaction between mapping and normalization. If there's a mapping combining grave->̀, normalization must be aware that ̀ is not an ASCII string! >> Thanks to you and the I18N Working Group for this comment. The XSL and XML Query Working Groups discussed the comment, and decided, because of the interactions between Unicode normalization and creation of character references, to fold together character expansion and Unicode normalization. In addition, the working groups decided to add creation of character references to the character expansion phase, because it had not been explicitly mentioned as part of that phase. Specifically, the working groups decided to replace the second and third bullets of Section 4 of Serialization with the following text: << 2. Character expansion is concerned with the representation of characters appearing in text and attribute nodes in the instance of the data model. The substitution processes that may apply are listed below, in priority order: a character that is handled by one process in this list will be unaffected by processes appearing later in the list, except that a character affected by Unicode normalization may be affected by creation of CDATA sections or by character escaping o URI escaping (in the case of URI-valued attributes in the HTML and XHTML output methods), as determined by the escape-uri-attributes parameter o Character mapping, as determined by the use-character-maps parameter. Text nodes that are children of elements specified by the cdata-section-elements parameter are not affected by this step. o Unicode Normalization, if requested by the normalization-form parameter. Unicode normalization is applied to the character stream that results after all markup generation and character expansion has taken place. For the definitions of the various normalization forms, see [Character Model for the World Wide Web 1.0] The meanings associated with the possible values of the normalization-form parameter are as follows: o NFC specifies the serialized result should be in Unicode Normalization Form C. o NFD specifies the serialized result should be in Unicode Normalization Form D. o NFKC specifies the serialized result should be in Unicode Normalization Form KC. o NFKD specifies the serialized result should be in Unicode Normalization Form KD. o fully-normalized specifies the serialized result should be in fully normalized form. o none specifies that no Unicode normalization should be applied. o An implementation-defined value has an implementation- defined effect. o Creation of CDATA sections, as determined by the cdata-section-elements parameter. Note that this is also affected by the encoding parameter, in that characters not present in the selected encoding cannot be represented in a CDATA section. o Escaping according to XML or HTML rules of special characters and of characters that cannot be represented in the selected encoding. For example replacing < by <. >> The Unicode Normalization phase becomes the third step of character expansion. Character mapping becomes the second step, with the clarification that it does not affect elements to which cdata-section-elements applies. This was done to make it clear that any characters affected by character mapping are not affected by Unicode Normalization. The lead-in to the bulleted list will be modified so that CDATA section creation and escaping still apply to characters affected by Unicode Normalization - this is a consequence of trying to fold the two together. Finally, the last bullet will be modified to make it clear that not only special characters, but characters that can't be represented in the selected encoding are affected by that final step. May I ask you to confirm that this response is acceptable to the I18N Working Group? Thanks, Henry ------------------------------------------------------------------ Henry Zongaro Xalan development IBM SWS Toronto Lab T/L 969-6044; Phone +1 905 413-6044 mailto:zongaro@ca.ibm.com
Received on Tuesday, 31 August 2004 21:46:41 UTC