Non-ASCII characters in namespace URIs from Bjoern Hoehrmann on 2001-10-21 (www-international@w3.org from October to December 2001)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 21 Oct 2001 22:29:28 +0200
To: xml-names-editor@w3.org
Message-ID: <spb6tts4cdfegkejta89vfu8gld28ehkmr@4ax.com>

Hi,

   XML 1.0, HTML 4.01, XPointer, the Character Model for the World Wide
Web and many other documents require (or in case of HTML 4.01, recommend
to) processors to apply a special encoding algorithm to URIs with
non-ASCII characters, to summarize a stable definition of this algorithm
could be

  => encode as UTF-8
  => NFC normalization
  => apply URI (%xx) encoding to each byte

Does this also apply to namespace URIs? I.e. are e.g.

  http://björn.höhrmann.de/ and
  http://bj%C3%B6rn.h%C3%B6hrmann.de/

equal? What about the case of those hex digits? Or has the comparison to
take place after UTF-8 encoding an then character by character or
probably byte by byte? What about normalization? The recommendation
keeps mum. Maybe the errata should add some clarification here.

(bcc to: xml-dev and www-international)

regards,
-- 
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/

Received on Sunday, 21 October 2001 16:30:15 UTC