- From: Henry Zongaro <zongaro@ca.ibm.com>
- Date: Tue, 31 Aug 2004 17:46:10 -0400
- To: Martin Duerst <duerst@w3.org>
- Cc: public-qt-comments@w3.org, w3c-i18n-ig@w3.org
Martin,
In [1], you submitted the following comment on the Last Call Working
Draft of XSLT 2.0 and XQuery 1.0 Serialization on behalf of the I18N
Working Group:
<<
[14] Section 3, four phases of serialization: Character expansion
comes before Encoding, but encoding depends on character
expansion (using numeric character references for characters
that don't exist in a certain encoding). This has to be
sorted out very carefully and explained in detail, ideally
with examples. There's also an interaction between mapping and
normalization. If there's a mapping combining grave->̀,
normalization must be aware that ̀ is not an ASCII string!
>>
Thanks to you and the I18N Working Group for this comment. The XSL
and XML Query Working Groups discussed the comment, and decided, because
of the interactions between Unicode normalization and creation of
character references, to fold together character expansion and Unicode
normalization. In addition, the working groups decided to add creation of
character references to the character expansion phase, because it had not
been explicitly mentioned as part of that phase.
Specifically, the working groups decided to replace the second and
third bullets of Section 4 of Serialization with
the following text:
<<
2. Character expansion is concerned with the representation of
characters appearing in text and attribute nodes in the
instance of the data model. The substitution processes that
may apply are listed below, in priority order: a character
that is handled by one process in this list will be
unaffected by processes appearing later in the list, except
that a character affected by Unicode normalization may be
affected by creation of CDATA sections or by character
escaping
o URI escaping (in the case of URI-valued attributes in the
HTML and XHTML output methods), as determined by the
escape-uri-attributes parameter
o Character mapping, as determined by the use-character-maps
parameter. Text nodes that are children of elements
specified by the cdata-section-elements parameter are not
affected by this step.
o Unicode Normalization, if requested by the
normalization-form parameter. Unicode normalization is
applied to the character stream that results after all
markup generation and character expansion has taken place.
For the definitions of the various normalization forms,
see [Character Model for the World Wide Web 1.0]
The meanings associated with the possible values of the
normalization-form parameter are as follows:
o NFC specifies the serialized result should be in Unicode
Normalization Form C.
o NFD specifies the serialized result should be in Unicode
Normalization Form D.
o NFKC specifies the serialized result should be in Unicode
Normalization Form KC.
o NFKD specifies the serialized result should be in Unicode
Normalization Form KD.
o fully-normalized specifies the serialized result should
be in fully normalized form.
o none specifies that no Unicode normalization should be
applied.
o An implementation-defined value has an implementation-
defined effect.
o Creation of CDATA sections, as determined by the
cdata-section-elements parameter. Note that this is also
affected by the encoding parameter, in that characters not
present in the selected encoding cannot be represented in
a CDATA section.
o Escaping according to XML or HTML rules of special
characters and of characters that cannot be represented in
the selected encoding. For example replacing < by <.
>>
The Unicode Normalization phase becomes the third step of character
expansion. Character mapping becomes the second step, with the
clarification that it does not affect elements to which
cdata-section-elements applies. This was done to make it clear that any
characters affected by character mapping are not affected by Unicode
Normalization. The lead-in to the bulleted list will be modified so that
CDATA section creation and escaping still apply to characters affected by
Unicode Normalization - this is a consequence of trying to fold the two
together. Finally, the last bullet will be modified to make it clear that
not only special characters, but characters that can't be represented in
the selected encoding are affected by that final step.
May I ask you to confirm that this response is acceptable to the I18N
Working Group?
Thanks,
Henry
------------------------------------------------------------------
Henry Zongaro Xalan development
IBM SWS Toronto Lab T/L 969-6044; Phone +1 905 413-6044
mailto:zongaro@ca.ibm.com
Received on Tuesday, 31 August 2004 21:46:41 UTC