- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 06 May 2004 16:04:38 +0900
- To: "Chris Haynes" <chris@harvington.org.uk>, <www-international@w3.org>
- Cc: Michel Suignard <michelsu@microsoft.com>
Hello Chris, In trying to clear up the remaining IRI issues, I found out that I planned to reply to this message of yours, but didn't get around to do it. At 17:20 03/08/07 +0100, Chris Haynes wrote: > "Martin Duerst" Replied: > > > > At 12:15 03/07/26 +0100, Chris Haynes wrote: > > > > > "Jungshik Shin" replied at: Saturday, July 26, 2003 11:31 AM > > > > > > It also depends on whether or not you set 'send URLs always in > > >UTF-8' in > > > > Tools|Options(?) in MS IE. > > > > > > > > > >True, but I'm trying to find a 'reliable' mechanism which is not > > >dependent on user-accessible controls. > > >IMHO, this is also a 'dangerous' option, in that it goes agains the >de > > >facto conventions and anticipates (parhaps incorrectly) the > > >recommendations of the proposed IRI RFC. It can only safely be used > > >with a 'consenting' server site. > > > > Sorry, no. The main dangerous thing is that authors use non-ASCII > > characters in URIs (without any %HH escaping) when this is clearly > > forbidden. > > > > Regards, Martin. > > >Martin, > >Are you saying that you approve of relying on users to select the >(Microsoft-specific) 'send URLs always in >UTF-8' menu option to ensure that UTF8 gets returned to the server? > >That is what was being suggested. Well, my above statement was meant in the following sense: There is NO spec that would allow inclusion of non-ASCII characters in URIs. The IRI spec is the first one that defines something similar to an URI that actually allows this. Any authors that for example put raw iso-8859-1 characters into an URI in a page in iso-8859-1 are therefore wrong; any 'it works' effect is coincidental, not according to specs. Suggesting that a browser that anticipates a future spec (the IRI spec) is dangerous, while (implicitly) blessing browsers and pages that don't conform to any spec is in my eyes a dangerous idea. >My argument was that any current HTTP-like system in which the >character encoding could be modified by menu controls in the user >agent, (and in which the actual encoding used is *not* conveyed in the >request) was inherently unreliable. I think we have to look at different parts of a HTTP request separately. There are mainly two parts: the 'path' part and the 'query' part. With respect to the path part, this is indeed influenced by the 'send URLs always in UTF-8' option in MS IE. But there are ways to get around this. For an example, see my Apache 'mod_fileiri' module, which allows to map requests both in a legacy encoding and in UTF-8 back to the file in question. [see http://www.w3.org/2003/06/mod_fileiri/Overview.html for an overview, including pointers to the actual code and to a talk of mine]. With respect to the query part, this is not affected by the 'send URLs always in UTF-8' option in MS IE. The query part is always sent in the encoding of the actual page, except for some browsers that implement the 'accept-charset' attribute on <form>. But for queries, it is rather easy to e.g. convert all the forms related to that query URI to UTF-8. You are right that the (perceived) character encoding of the page can affect both parts. Of course, users might always change the character encoding, and as a result send something that the server gets as garbage. However, users don't use menus just for fun, and if anybody would ever come and complain, the server side would be very justified to say "don't mess around with the settings if you expect your queries to work". So this is very much a theoretical concern. Regards, Martin.
Received on Thursday, 6 May 2004 04:48:09 UTC