Re: XML 1.1 CR comment response for Waclawek-01 from Karl Waclawek on 2003-06-24 (www-xml-blueberry-comments@w3.org from June 2003)

From: Karl Waclawek <karl@waclawek.net>
Date: Mon, 23 Jun 2003 20:05:09 -0400
To: "Paul Grosso" <pgrosso@arbortext.com>
Cc: <www-xml-blueberry-comments@w3.org>
Message-ID: <001201c339e4$46928500$0207a8c0@karl>

> In response to your email to the XML 1.1 CR recorded at
> http://lists.w3.org/Archives/Public/www-xml-blueberry-comments/2002Oct/0010.html
> the XML Core WG generated, discussed, and resolved the follow issue:
> 
> Issue Waclawek-01:
> doing normalization in the application
> 
> Summary resolution: explained
> 
> Response
> --------
> A normalized start tag would not match with an unnormalized end tag, so if
> normalization checking is to be done and result in a "not normalized" error
> (instead of an unmatched start/end tag error), it has to happen before any
> attempt to match the start tag and end tag.
> ========
> 
> Please let us know whether you accept our resolution of our comment,
> or wish to have an objection formally recorded.  If we do not hear
> from you within 10 days we will assume that you accept our response
> (though we would prefer to hear from you in any case if practical).

The XML specs already say that string matching has to be "binary".
So, all which normalization checking contributes, is to generate a more 
specific error message when the start and end tag don't match.

And what if both are not normalized, but they do match?
Why does the *XML processor* have to tell me that they are not normalized?

One paragraph in the spec states that the following productions,

  a.. CData 
  b.. CharData 
  c.. content 
  d.. Name 
  e.. Nmtoken 

should not start with a composing character. This, of course means the input
needs to be XML-parsed in order to recognize them. But can that not be done
after the XML processor has reported the data to the application?

I am not convinced that the above is enough reason *not* to make normalization
a separate concern, that is, leave it to other processing layers.
If one cares about it, one can always pipe XML input through a normalization
checking filter, before passing it on to the parser, and check for composing
characters afterwards.

How would it affect XML parsers if the definition of normalization changes?
With a separate Unicode processing layer the implementation updates could be
concentrated in one point, with no need to keep the parser implementation in sync.

But since it is an optional feature (if I remember correctly),
I can live with it.

Karl

Received on Monday, 23 June 2003 20:02:28 UTC