W3C home > Mailing lists > Public > www-i18n-comments@w3.org > April 2001

RE: Unicode in a URL

From: Mike Brown <mbrown@webb.net>
Date: Fri, 27 Apr 2001 12:05:06 -0600
Message-ID: <8D96EDA0AC04D31197B400A0C96C1480F70AC0@ossex1.webb.net>
To: "'duerst@w3.org'" <duerst@w3.org>, Mike Brown <mbrown@webb.net>, "'unicode@unicode.org'" <unicode@unicode.org>
Cc: www-i18n-comments@w3.org
I asserted, referring to section 4.2.2 of the XML spec:
>> <!ENTITY greeting SYSTEM 
>> "http://somewhere/getgreeting?lang=es&name=C%C3%A9sar">
>> ]>
>> 
>> The name Ce'sar is represented here as C%C3%A9sar in the 
>> UTF-8 based escaping, as per the XML requirement.

You replied:
> What the XML spec (and all the others mentioned above) say is 
> something different.
>
> - If you use non-ASCII characters directly in a system id, 
>   they're converted using UTF-8.
> - If you want anything else, use exactly the %-escapes you 
>   want. You won't get the benefit of using the actual
>   character in the source document.

OK, I can now see how this is the same as in HTML, where the spec is saying
what a document processor should do when it encounters malformed URI
references. The way it is worded in the main spec, it looks to me like it is
telling a document author how to go about writing a URI reference. However,
I am willing to admit I am wrong. In my own paper I even mentioned the
erratum to the XML spec that changes the wording to indicate that this
section is in fact intended for an XML processor. Yeesh.

My statement about conflict with HTTP stems from my incomplete understanding
of HTTP's iso-8859-1 legacy. Never mind.
Received on Friday, 27 April 2001 14:04:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:32:27 GMT