Re: Character Model Comments Clarifications from Martin Duerst on 2002-09-24 (www-i18n-comments@w3.org from September 2002)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 24 Sep 2002 15:29:42 +0900
To: "Mark Scardina" <mark.scardina@oracle.com>, <www-i18n-comments@w3.org>
Cc: w3c-xsl-wg@w3.org, w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20020924144728.038f6800@localhost>
Hello Mark, dear XSL WG members,

Many thanks for your clarifications. Here are some
responses and some requests for further clarification.

At 14:18 02/09/10 -0700, Mark Scardina wrote:

>Martin regards the issues below our responses are inline.
>
>1) http://lists.w3.org/Archives/Member/w3c-xsl-wg/2002Aug/0044.html
>(http://www.w3.org/International/Group/2002/charmod-lc/#C187)
>
>[XSL] To clarify, our concern was based on first not seeing a clear
>definition of a private system and second, based upon what we inferred a
>private system to be, why should it fall under the prevue of your spec.
>It obviously could not be enforced.
>
>2) http://lists.w3.org/Archives/Member/w3c-xsl-wg/2002Aug/0045.html
>     (http://www.w3.org/International/Group/2002/charmod-lc/#C146)
>
>[XSL] XSLT allows and needs manipulation of character sequences at any
>boundaries not simply entity boundaries.  An XSLT stylesheet itself can
>expose non-normalized strings in an effort to match sequences in a 1.0
>document.  It is also not just about serialized XML as processes can
>exchange result trees which would mean working with the DOM which is not
>addressed in your spec.

Many thanks for this clarification, which I think moves us
forward quite a bit. Your original comment was:

 >>>>
"[S] Specifications of text-based languages and protocols SHOULD define 
precisely the construct boundaries necessary to obtain a complete 
definition of full-normalization . These definitions MUST include at least 
the boundaries between markup and character data as well as entity 
boundaries (if the language has any include mechanism) and SHOULD include 
any other boundary that may create denormalization when instances of the 
language are processed."

The requirement (still in 4.4) about defining construct boundaries is very 
unclear when applied to a language that performs dynamic manipulation of 
strings.
 >>>>

The requirement that a language has to be clear about the boundaries
of its syntactic constructs was designed in particular so that
simple applications of XSLT (where text nodes,... are treated as
units and not modified, but potentially concatenated) can produce
normalized output from normalized input easily.

You are right that this conformance criterion doesn't deal with
dynamic operations. This is deal with later in the spec (same
subsection):

[S] Specifications of API components (functions/methods) that perform 
operations that may produce unnormalized text output from normalized text 
input MUST define whether normalization is the responsibility of the caller 
or the callee. Specifications MAY make performing normalization optional 
for some API components; in this case the default SHOULD be that 
normalization is performed, and an explicit option SHOULD be used to switch 
normalization off. Specifications MUST NOT make the implementation of 
normalization optional.

[S] Specifications that define a mechanism (for example an API or a 
defining language) for producing a document SHOULD require that the final 
output of this mechanism be normalized.

These are the criteria that we wrote with e.g. the DOM or the
dynamic aspects of XSLT in mind.

We therefore decided to treat your comment as 'noted' (i.e. not
directly applicable), because the matter you commented on is covered
in another part of our specificiation.
Please tell us, at your earliest convenience, whether you are satisfied
with this resolution or not.

If you have any comments on the criteria listed above for API
components, please raise a new comment for this part of the
specification.


>3) http://lists.w3.org/Archives/Member/w3c-xsl-wg/2002Jul/0088.html
>     (http://www.w3.org/International/Group/2002/charmod-lc/#C140)
>For 3), we asked for clarification on one part of your comment.
>Actually, we would like you to clarify both sentences. There was some
>follow-up discussion on 3), but this didn't really clarify the comment
>itself. If you think that you need to make another comment, please check
>the comments you have already made, and if you think there is something
>missing, please submit another comment asap.
>
>[XSL] Regarding our first sentence, Section 3.5 covers a number of
>processing scenarios and conditionals in one long paragraph making it
>difficult to parse and properly evaluate.  Our suggestion was to break
>this up structurally with additional context so that the types of
>processes and their exceptions were better delineated.

Can you please clarify 'one long paragraph'?

At http://www.w3.org/International/Group/2002/charmod-lc#C140,
you comment on what looks to us like 6 paragraphs, not one.



>[XSL] Regarding the second sentence, Anders already responded with some
>examples, but the essence is that XSL and XML for that matter must be
>able to represent non-Unicode characters inside attributes as well as
>elements.

Can you please clarify your usage of 'non-Unicode characters'?

If you refer to the use of the Private Use Area in Unicode, then
section 3.5 of the Character Model doesn't forbid the use of
code points from the PUA, although the use of private use codepoints
is discouraged (and this discouragement explained) in
3.6.3 Private use code points.

If you mean characters not represented in any way as Unicode
codepoints, then we have to admit that it is impossible for
us to address this comment, because it is very clear that
neither XSL nor XML are able to represent non-Unicode characters
(irrelevant of whether they would appear in elements or attributes).


Looking forward to hear from you soon,

Regards,    Martin.
Received on Tuesday, 24 September 2002 02:35:53 UTC