W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > January 2006

(unknown charset) Re: [Bjoern Hoehrmann] Re: IRIEverywhere-27

From: (unknown charset) Richard Tobin <richard@inf.ed.ac.uk>
Date: Tue, 24 Jan 2006 17:53:11 +0000 (GMT)
To: (unknown charset) ht@inf.ed.ac.uk (Henry S. Thompson), (unknown charset) François Yergeau <francois@yergeau.com>
Cc: (unknown charset) public-xml-core-wg <public-xml-core-wg@w3.org>
Message-Id: <20060124175311.88AB3592342@macintosh.inf.ed.ac.uk>

We went through all this in 2003, when I was worried about the
implications for namespace IRIs.  See the thread starting at:


It turned out not to matter for namespaces, because the IRI is not

It seems to me now that if normalization is going to be done, it
should be done when the document when it is read in, not at random
points later on.  And XML 1.1 says that input (from non-unicode
sources) SHOULD be normalized.  We decided against a MUST, and I don't
think we should introduce it for this special case.  Suddenly
requiring existing processors to support normalization is not

Conceptually (though implementations are not required to work like
this) once the XML is parsed you have Unicode characters, so the
variant of step 1 in 3.1 of RFC3987 that applies is C:

  If the IRI is in a Unicode-based character encoding (for example,
  UTF-8 or UTF-16), do not normalize (see section for
  details).  Apply step 2 directly to the encoded Unicode character

-- Richard
Received on Tuesday, 24 January 2006 17:53:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:16:35 UTC