Re: [Minutes] 14 Apr 2003 TAG teleconf (URIEquivalence-15, IRIEverywhere-27, xmlIDSemantics-32, abstractComponentRefs-37, namespaceDocument-8)

On Tuesday, April 15, 2003, 11:55:03 PM, Tim wrote:

TB> Chris Lilley wrote:

>> I would not like people to get the impression from reading these
>> minutes that i am in favour of 'canonicalizing' IRIs by hexifying
>> them. Like Martin says and like the IRI spec says, only do this as a
>> last resort when using antiquated transport protocols. Better is to
>> use whatever method (quoted-unreadable, base64, ncr, \u) the
>> environment provides to preserve the original characters.

TB> I think I agree, but that last sentence is potentially very misleading. 
TB>   If the IRI is embedded in an XML document, the IRI's Unicode 
TB> characters should appear in the infoset as themselves and *only* as 
TB> themselves unless they IRI/URI-special characters like '#' or '%', in 
TB> which case they should appear *only* as %-escapes.   In the XML 
TB> instance, this may be accomplished by having them appear as themselves 
TB> (if you're using an encoding that supports them) or via NCRs.

TB> In the XML context, other mechanisms such as \u or base64 should not be 
TB> used.

Oh - yes, of course. I was just alluding to other environments having
other ways to do the same thing.

Using a method that your environment does not provide does not
preserve the character, I agree.

Although, if you are sending an XML file by email, using a protection
of the transport (base64 transfer encoding) with the actual characters
as themselves in the file is exactly the same as changing the XML so
that all the non-ASCII characters are NCRs, except it preserves the
readability on the other end so is in fact better.


-- 
 Chris                            mailto:chris@w3.org

Received on Tuesday, 15 April 2003 18:10:54 UTC