RE: Unicode in a URL from Mike Brown on 2001-04-27 (www-i18n-comments@w3.org from April 2001)

From: Mike Brown <mbrown@webb.net>
Date: Fri, 27 Apr 2001 12:05:06 -0600
To: "'duerst@w3.org'" <duerst@w3.org>, Mike Brown <mbrown@webb.net>, "'unicode@unicode.org'" <unicode@unicode.org>
Cc: www-i18n-comments@w3.org
Message-ID: <8D96EDA0AC04D31197B400A0C96C1480F70AC0@ossex1.webb.net>

I asserted, referring to section 4.2.2 of the XML spec:
>> <!ENTITY greeting SYSTEM 
>> "http://somewhere/getgreeting?lang=es&name=C%C3%A9sar">
>> ]>
>> 
>> The name Ce'sar is represented here as C%C3%A9sar in the 
>> UTF-8 based escaping, as per the XML requirement.

You replied:
> What the XML spec (and all the others mentioned above) say is 
> something different.
>
> - If you use non-ASCII characters directly in a system id, 
>   they're converted using UTF-8.
> - If you want anything else, use exactly the %-escapes you 
>   want. You won't get the benefit of using the actual
>   character in the source document.

OK, I can now see how this is the same as in HTML, where the spec is saying
what a document processor should do when it encounters malformed URI
references. The way it is worded in the main spec, it looks to me like it is
telling a document author how to go about writing a URI reference. However,
I am willing to admit I am wrong. In my own paper I even mentioned the
erratum to the XML spec that changes the wording to indicate that this
section is in fact intended for an XML processor. Yeesh.

My statement about conflict with HTTP stems from my incomplete understanding
of HTTP's iso-8859-1 legacy. Never mind.

Received on Friday, 27 April 2001 14:04:14 UTC