- From: (unknown charset) Richard Tobin <richard@inf.ed.ac.uk>
- Date: Tue, 24 Jan 2006 17:53:11 +0000 (GMT)
- To: (unknown charset) ht@inf.ed.ac.uk (Henry S. Thompson), (unknown charset) François Yergeau <francois@yergeau.com>
- Cc: (unknown charset) public-xml-core-wg <public-xml-core-wg@w3.org>
We went through all this in 2003, when I was worried about the implications for namespace IRIs. See the thread starting at: http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2003JanMar/0003.html It turned out not to matter for namespaces, because the IRI is not dereferenced. It seems to me now that if normalization is going to be done, it should be done when the document when it is read in, not at random points later on. And XML 1.1 says that input (from non-unicode sources) SHOULD be normalized. We decided against a MUST, and I don't think we should introduce it for this special case. Suddenly requiring existing processors to support normalization is not reasonable. Conceptually (though implementations are not required to work like this) once the XML is parsed you have Unicode characters, so the variant of step 1 in 3.1 of RFC3987 that applies is C: If the IRI is in a Unicode-based character encoding (for example, UTF-8 or UTF-16), do not normalize (see section 5.3.2.2 for details). Apply step 2 directly to the encoded Unicode character sequence. -- Richard
Received on Tuesday, 24 January 2006 17:53:29 UTC