(unknown charset) Re: [Bjoern Hoehrmann] Re: IRIEverywhere-27 from (unknown charset) Richard Tobin on 2006-01-24 (public-xml-core-wg@w3.org from January 2006)

From: (unknown charset) Richard Tobin <richard@inf.ed.ac.uk>
Date: Tue, 24 Jan 2006 17:53:11 +0000 (GMT)
To: (unknown charset) ht@inf.ed.ac.uk (Henry S. Thompson), (unknown charset) François Yergeau <francois@yergeau.com>
Cc: (unknown charset) public-xml-core-wg <public-xml-core-wg@w3.org>
Message-Id: <20060124175311.88AB3592342@macintosh.inf.ed.ac.uk>

We went through all this in 2003, when I was worried about the
implications for namespace IRIs.  See the thread starting at:

  http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2003JanMar/0003.html

It turned out not to matter for namespaces, because the IRI is not
dereferenced.

It seems to me now that if normalization is going to be done, it
should be done when the document when it is read in, not at random
points later on.  And XML 1.1 says that input (from non-unicode
sources) SHOULD be normalized.  We decided against a MUST, and I don't
think we should introduce it for this special case.  Suddenly
requiring existing processors to support normalization is not
reasonable.

Conceptually (though implementations are not required to work like
this) once the XML is parsed you have Unicode characters, so the
variant of step 1 in 3.1 of RFC3987 that applies is C:

  If the IRI is in a Unicode-based character encoding (for example,
  UTF-8 or UTF-16), do not normalize (see section 5.3.2.2 for
  details).  Apply step 2 directly to the encoded Unicode character
  sequence.

-- Richard

Received on Tuesday, 24 January 2006 17:53:29 UTC