Implementation or normalization from Rick Jelliffe on 2003-06-04 (www-xml-blueberry-comments@w3.org from June 2003)

From: Rick Jelliffe <ricko@topologi.com>
Date: Wed, 4 Jun 2003 15:51:16 +1000
To: <www-xml-blueberry-comments@w3.org>
Message-ID: <016c01c32a5d$505097d0$4bc8a8c0@AlletteSystems.com>

On XML-DEV, John Cowan wrote

> It seems that nobody has implemented the optional normalization checking
> feature of XML 1.1 in an XML 1.1 browser.  If nobody does, the feature
> may go bye-bye when XML 1.1 goes to Rec.

> Those who care should talk to www-xml-blueberry-comments@w3.org , saying
> what they are going to do.

Topologi has long implemented normalization on data import in our editor. This is 
an appropriate approach for an editor. So we don't need normalization-checking.

Personally, I believe that normalization-checking should be an optionally SAX feature
of XML processors: they may or may not provide it, and it may be normalization-checking
of the raw input stream or after parsing.  (That i18n experts say it should it is required and
implementation people freak out at doing it is probably the reason why checking *should* be 
required  :-) )

However, I think there are probably two compelling reasons to at least make 
checking optional. First, because of the lack of small libraries to do this. And second
because Unicode Normalization will apparantly only stabilize for Unicode 4
(according to Adobe's Ken Lunde, who is the God of CJKV information
processing.)  

So I suggest that normalization is something that the XML Core WG might like
to echo St Augustine and say "Make me normalized, but not yet".  In other
words, treat normalization-checking as something that should be implemented
as soon as convenient.  (That XML 1.1 is supposed to be Unicode-version
independent, yet there is a significant feature which really will only become
practical after Unicode 4 systems are deployed shows how problematic 
version-independence really is....)

But I think what the XML Core WG should not do is to have wording that would
ban or discourage normalization-error-checking.  Maybe the most expedient thing
would be to move the normalization checking into another non-normative annex.
The reason against completely removing it is because XML 1.1. is supposed to
be Unicode-version-independent: people will implement features as they
become convenient, and they become convenient as libraries are deployed, 
and libraries are deployed following the release of standards:-- so in order
to make XML 1.1 Unicode-version-neutral  it also cannot preclude things
like normalization which will become convenient in the medium term.

Cheers
Rick Jelliffe

Received on Wednesday, 4 June 2003 01:45:20 UTC