- From: Chris Lilley <chris@w3.org>
- Date: Wed, 23 Mar 2005 20:24:28 +0100
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: www-i18n-comments@w3.org
On Wednesday, March 23, 2005, 7:54:44 PM, Bjoern wrote: BH> * Chris Lilley wrote: >>Yes, you are right for the case where the IRI is converted to a URI and >>stored in the XML. I was thinking of the case where the IRI is stored >>directly in the XML and only hexified to cross the wire. But then I >>suppose its not "A new URI format" in that case .... or is it? BH> That's indeed not new resource identifier syntax, but I think such BH> protocol interactions are really orthogonal to the requirement. It BH> is for new URI syntax which requires that encoded character strings BH> be represented in a way compatible with URI syntax which requires BH> the use of %xx escapes if the conversion algorithm yields in octets BH> not representable using characters allowed in URIs. Remember that BH> the components in URIs and IRIs represent octets, not characters, Yes, I remember that, although RFC 3986 was supposed to tighten that up a little. Section 2.5. Identifying Data does discuss it a little, but its still clearly octets. BH> so BH> data:text/plain;charset=utf-7,Bj+APY-rn BH> data:text/plain;charset=utf-8,Bj%C3%B6rn BH> data:text/plain;charset=utf-8,Björn BH> are legal IRIs that resolve to the same resource, but Yes BH> data:text/plain;charset=utf-7,Björn BH> data:text/plain;charset=utf-8,Björn BH> while legal IRIs, do not. BH> The same is true for fragment identifiers, BH> you could create a media type for which fragment identifiers do not BH> use UTF-8 / %xx-encoding, e.g., for application/x-foo-xml You could, at the risk of not conforming to >>> C060 [S] Specifications that define new syntax for URIs, such as a >>> new URI scheme or a new kind of fragment identifier, MUST specify >>> that characters outside the US-ASCII repertoire are encoded using >>> UTF-8 and %HH-escaping. which is where we came in..... BH> and BH> <!DOCTYPE foo [<!ATTLIST foo id ID #IMPLIED>]> BH> <foo id = "Björn" href = "#Bj+APY-rn" /> BH> you can require that the IRI Reference in href refers to <foo> as BH> identified by the ID in id as the fragment identifier syntax for BH> application/x-foo-xml is based on UTF-7 rather than UTF-8. So the BH> requirement is relevant even if no %xx escaping is involved. Yes, I agree. >>Yes, thats a good URI test. I will add it to the test suite. BH> Great! I have actually had a lengthy response to an email from you on the same subject, dated July 23, 2003, 5:37:06 AM, in my 'drafts' folder for the longest time. Since then IRI has been published so the answer would now be shorter and clearer than it was. -- Chris Lilley mailto:chris@w3.org Chair, W3C SVG Working Group W3C Graphics Activity Lead
Received on Thursday, 24 March 2005 02:55:57 UTC