Re: [draft-duerst-iri-bis-06] differences from HTML5 algorithm

Hello Anne,

On 2009/09/11 17:39, Anne van Kesteren wrote:
> On Fri, 11 Sep 2009 09:39:03 +0200, Martin J. Dürst
> <duerst@it.aoyama.ac.jp> wrote:
>> At least personally, I'm always also interested in the "story behind
>> the story", i.e. things like which version of which browser got it
>> wrong (and others followed), and so on. But much of that might be
>> difficult to reconstruct.
>
> I see. I'm not sure if I or Ian can be of much help there. Usually when
> we reverse engineer something we just look at the latest deployed
> versions of browsers and any experimental builds if available (sometimes
> just the latter). Not much archeology involved. Having said that, the
> text in HTML4 on URIs suggests this is at least over a decade old.
> (Though somewhere in 200x it changed to just the <query> component that
> depended on the document encoding.)

The most relevant text in HTML4 on URIs is at 
http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1, and specifies to 
use UTF-8, independent of the part of the URI.

What you may refer to is at 
http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset (the 
accept-charset attribute of the form element):

"The default value for this attribute is the reserved string "UNKNOWN". 
User agents may interpret this value as the character encoding that was 
used to transmit the document containing this FORM element."

This essentially says that you MAY send data from a form in the document 
encoding, and was followed well by browsers. It seems that some browser 
implementer along the way extended that to query parts in other URIs 
(which don't have anything to do with <form>), and got stuck with it. 
This is especially unlucky as for <form>, you can just say 
accept-charset='utf-8', and get all the data sent in UTF-8, but there is 
no such attribute on a@href and others.

Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Friday, 11 September 2009 08:58:51 UTC