W3C home > Mailing lists > Public > www-i18n-comments@w3.org > March 2005

Re: Why hexify fragments?

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 23 Mar 2005 19:54:44 +0100
To: Chris Lilley <chris@w3.org>
Cc: www-i18n-comments@w3.org
Message-ID: <4248a406.70238812@smtp.bjoern.hoehrmann.de>

* Chris Lilley wrote:
>Yes, you are right for the case where the IRI is converted to a URI and
>stored in the XML. I was thinking of the case where the IRI is stored
>directly in the XML and only hexified to cross the wire. But then I
>suppose its not "A new URI format" in that case .... or is it?

That's indeed not new resource identifier syntax, but I think such
protocol interactions are really orthogonal to the requirement. It
is for new URI syntax which requires that encoded character strings
be represented in a way compatible with URI syntax which requires
the use of %xx escapes if the conversion algorithm yields in octets
not representable using characters allowed in URIs. Remember that
the components in URIs and IRIs represent octets, not characters,
so

  data:text/plain;charset=utf-7,Bj+APY-rn
  data:text/plain;charset=utf-8,Bj%C3%B6rn
  data:text/plain;charset=utf-8,Björn

are legal IRIs that resolve to the same resource, but

  data:text/plain;charset=utf-7,Björn
  data:text/plain;charset=utf-8,Björn

while legal IRIs, do not. The same is true for fragment identifiers,
you could create a media type for which fragment identifiers do not
use UTF-8 / %xx-encoding, e.g., for application/x-foo-xml and 

  <!DOCTYPE foo [<!ATTLIST foo id ID #IMPLIED>]>
  <foo id = "Björn" href = "#Bj+APY-rn" />

you can require that the IRI Reference in href refers to <foo> as
identified by the ID in id as the fragment identifier syntax for
application/x-foo-xml is based on UTF-7 rather than UTF-8. So the
requirement is relevant even if no %xx escaping is involved.

>Yes, thats a good URI test. I will add it to the test suite.

Great!
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Wednesday, 23 March 2005 18:56:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:32:35 GMT