- From: Chris Lilley <chris@w3.org>
- Date: Wed, 16 Apr 2003 00:10:43 +0200
- To: Tim Bray <tbray@textuality.com>
- CC: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
On Tuesday, April 15, 2003, 11:55:03 PM, Tim wrote: TB> Chris Lilley wrote: >> I would not like people to get the impression from reading these >> minutes that i am in favour of 'canonicalizing' IRIs by hexifying >> them. Like Martin says and like the IRI spec says, only do this as a >> last resort when using antiquated transport protocols. Better is to >> use whatever method (quoted-unreadable, base64, ncr, \u) the >> environment provides to preserve the original characters. TB> I think I agree, but that last sentence is potentially very misleading. TB> If the IRI is embedded in an XML document, the IRI's Unicode TB> characters should appear in the infoset as themselves and *only* as TB> themselves unless they IRI/URI-special characters like '#' or '%', in TB> which case they should appear *only* as %-escapes. In the XML TB> instance, this may be accomplished by having them appear as themselves TB> (if you're using an encoding that supports them) or via NCRs. TB> In the XML context, other mechanisms such as \u or base64 should not be TB> used. Oh - yes, of course. I was just alluding to other environments having other ways to do the same thing. Using a method that your environment does not provide does not preserve the character, I agree. Although, if you are sending an XML file by email, using a protection of the transport (base64 transfer encoding) with the actual characters as themselves in the file is exactly the same as changing the XML so that all the non-ASCII characters are NCRs, except it preserves the readability on the other end so is in fact better. -- Chris mailto:chris@w3.org
Received on Tuesday, 15 April 2003 18:10:54 UTC