- From: Martin Duerst <duerst@w3.org>
- Date: Sun, 09 May 2004 09:38:32 +0900
- To: "Chris Haynes" <chris@harvington.org.uk>, <www-international@w3.org>
- Cc: "Michel Suignard" <michelsu@microsoft.com>, <public-iri@w3.org>
Hello Chris, Many thanks for your clear response. I have closed this issue. Regards, Martin. At 10:30 04/05/07 +0100, Chris Haynes wrote: >Martin, > >Many thanks for the response. > >Your expanded sentence fully addresses this issue, as far as I am concerned. > >Chris > > >----- Original Message ----- >From: "Martin Duerst" <duerst@w3.org> >To: "Chris Haynes" <chris@harvington.org.uk>; <www-international@w3.org> >Cc: "Michel Suignard" <michelsu@microsoft.com>; <public-iri@w3.org> >Sent: Friday, May 07, 2004 7:02 AM >Subject: Re: what should the charset be in the response to the server > > > > > > Hello Chris, > > > > Many thanks for your reply. I have copied the IRI list > > because I think this discussion is relevant for the current > > draft. > > > > At 13:38 04/05/06 +0100, Chris Haynes wrote: > > >Thanks for the response, Martin, > > > > > >I only noticed this response _after_ I had replied to your other response > > >on the > > >IRI list, so I apologize that my earlier response did not take into > > >account this > > >message of yours. > > > > > >Trying to bring this topic to closure, I think my core worry arises > each time > > >there are what-appear-to-me-to-be normative statements that 'the page >encoding > > >determines the encoding used in requests derived from that page' - > > >ignoring the > > >possibility of users having changed the encoding setting. > > > > > >We obviously both agree that users 'should not' use these controls > (just as I > > >diapprove of the use of 'tone controls' and spectral filters in Hi Fi > systems > > >for other than 'loudness' compensation), but I get worried every time the > > >possibility of their use is ignored. > > > > > >The situation is not purely 'theoretical ' I've seen reports that it is > > >common > > >practice in some countries for people to switch to their 'national' > character > > >set every time they appear to have a problem in viewing a page - which > > >could be > > >occasioned by their browser not having UTF-8 support. > > > > Ok, so let's have a look at this case: Either switching to their 'national' > > character encoding solves the problem, in which case the page was badly > > labeled, and the page author is to blame. Or switching does not solve > > the problem, in which case the user may even not be able to read the > > page, and therefore won't fill in the form. Or the page only contains > > US-ASCII characters to begin with, and the user doesn't have any reason > > to switch encodings. > > > > That probably leaves us with just one intermediate case: The page is > > mostly in US-ASCII, but with a few other characters (e.g. 'smart > quotes',...). > > The user sees some problem, tries to fix it by switching the encoding. > > That doesn't help, so the user gives up, and just fills in the form > > (which is readable enough to complete the task). > > > > If you know about any other scenarios where switching encoding and then > > filling in the form with a wrong encoding can happen realistically, > > please tell me. > > > > > > >I help provide support to the users of an open-source web server, and we > > >frequently get requests for help from people managing web services who, >having > > >read the appropriate RFCs and W3 specs in detail, had not appreciated that > > >user > > >agents can change the encoding in ways which the request-receiving server > > >cannot > > >detect. > > > > I was giving a tutorial about Web internationalization for years, and > > the issue of encoding in forms always came up, but from the time when > > the first browsers supporting UTF-8 came out, that was always given as > > an answer, and I haven't heard anybody question this before you. But > > of course your mileage may vary. > > > > But there is an additional point: A server isn't helpless against users > > changing the encoding. UTF-8 has the very helpful property of having > > very specific byte sequences. It is easy to check these with a > > regular expression, for an example, please see > > http://www.w3.org/International/questions/qa-forms-utf-8.html. > > > > > > >I suppose I'm just keen to make sure that wherever this topic appears, the > > >potential behavior of the vast majority of browsers in the world is >adequately > > >and completely described. > > > > > >If there were an RFC somewhere which said that the user agent 'MUST NOT' > > >change > > >the encoding, and that real-world browsers were ignoring this stricture, I > > >would > > >agree that other RFCs were right to describe what should be, rather > than what > > >is. > > > > > >But as far as I know, the ability for users to override the encoding > does not > > >contravene any existing RFC, and therefore other RFCs ought at least to > > >recognize that possibility, and not infer, by omission, a level of > certainty > > >which can never be assured. > > > > > >I think I would have a very poor view of any web site which told me it > was my > > >fault a request got garbled because I made use of a freely-available > > >control on > > >my browser. > > > > > >Let me try to conclude this by just asking that, so long as user > control over > > >the encoding is permitted by RFCs, that possibility is explicitly > > >recognized by > > >other RFCs., and that we dont try to pretend that it does not exist > or, even > > >worse, that failures and errors in decoding are the user's fault for > > >breaking an > > >unwritten, untestable non-rule. > > > > I'm still not sure to what extent this is really happening. But I have > > clarified this issue by expanding the sentence in question as follows: > > > > "Likewise, when setting up a new Web form using UTF-8 as the encoding > > of the form page, the returned query URIs will use UTF-8 as an encoding > > (unless the user for whatever reason changes the character encoding) > > and will therefore be compatible with IRIs." > > > > This leaves it to the reader to judge for him/herself how high > > the probability is that the user is switching code pages. > > > > Regards, Martin. > > > > > > >Chris > > > > > > > > >----- Original Message ----- > > >From: "Martin Duerst" <duerst@w3.org> > > >To: "Chris Haynes" <chris@harvington.org.uk>; <www-international@w3.org> > > >Cc: "Michel Suignard" <michelsu@microsoft.com> > > >Sent: Thursday, May 06, 2004 8:04 AM > > >Subject: Re: what should the charset be in the response to the server > > > > > > > > > > Hello Chris, > > > > > > > > In trying to clear up the remaining IRI issues, I found out that > > > > I planned to reply to this message of yours, but didn't get around > > > > to do it. > > > > > > > > At 17:20 03/08/07 +0100, Chris Haynes wrote: > > > > > > > > > "Martin Duerst" Replied: > > > > > > > > > > > > > > > > At 12:15 03/07/26 +0100, Chris Haynes wrote: > > > > > > > > > > > > > "Jungshik Shin" replied at: Saturday, July 26, 2003 11:31 AM > > > > > > > > > > > > > > It also depends on whether or not you set 'send URLs > always in > > > > > > >UTF-8' in > > > > > > > > Tools|Options(?) in MS IE. > > > > > > > > > > > > > > > > > > > > > >True, but I'm trying to find a 'reliable' mechanism which is not > > > > > > >dependent on user-accessible controls. > > > > > > >IMHO, this is also a 'dangerous' option, in that it goes > agains the > > > > >de > > > > > > >facto conventions and anticipates (parhaps incorrectly) the > > > > > > >recommendations of the proposed IRI RFC. It can only safely be > used > > > > > > >with a 'consenting' server site. > > > > > > > > > > > > Sorry, no. The main dangerous thing is that authors use non-ASCII > > > > > > characters in URIs (without any %HH escaping) when this is clearly > > > > > > forbidden. > > > > > > > > > > > > Regards, Martin. > > > > > > > > > > > > > > >Martin, > > > > > > > > > >Are you saying that you approve of relying on users to select the > > > > >(Microsoft-specific) 'send URLs always in > > > > >UTF-8' menu option to ensure that UTF8 gets returned to the server? > > > > > > > > > >That is what was being suggested. > > > > > > > > Well, my above statement was meant in the following sense: > > > > There is NO spec that would allow inclusion of non-ASCII > > > > characters in URIs. The IRI spec is the first one that > > > > defines something similar to an URI that actually allows this. > > > > Any authors that for example put raw iso-8859-1 characters > > > > into an URI in a page in iso-8859-1 are therefore wrong; > > > > any 'it works' effect is coincidental, not according to specs. > > > > Suggesting that a browser that anticipates a future spec > > > > (the IRI spec) is dangerous, while (implicitly) blessing > > > > browsers and pages that don't conform to any spec is in > > > > my eyes a dangerous idea. > > > > > > > > > > > > >My argument was that any current HTTP-like system in which the > > > > >character encoding could be modified by menu controls in the user > > > > >agent, (and in which the actual encoding used is *not* conveyed in the > > > > >request) was inherently unreliable. > > > > > > > > I think we have to look at different parts of a HTTP request > separately. > > > > There are mainly two parts: the 'path' part and the 'query' part. > > > > > > > > With respect to the path part, this is indeed influenced by the > > > > 'send URLs always in UTF-8' option in MS IE. But there are ways > > > > to get around this. For an example, see my Apache 'mod_fileiri' > > > > module, which allows to map requests both in a legacy encoding and > > > > in UTF-8 back to the file in question. > > > > [see http://www.w3.org/2003/06/mod_fileiri/Overview.html for an > overview, > > > > including pointers to the actual code and to a talk of mine]. > > > > > > > > With respect to the query part, this is not affected by the > > > > 'send URLs always in UTF-8' option in MS IE. The query part > > > > is always sent in the encoding of the actual page, except > > > > for some browsers that implement the 'accept-charset' attribute > > > > on <form>. But for queries, it is rather easy to e.g. convert > > > > all the forms related to that query URI to UTF-8. > > > > > > > > You are right that the (perceived) character encoding of the > > > > page can affect both parts. Of course, users might always > > > > change the character encoding, and as a result send something > > > > that the server gets as garbage. However, users don't use > > > > menus just for fun, and if anybody would ever come and complain, > > > > the server side would be very justified to say "don't mess > > > > around with the settings if you expect your queries to work". > > > > So this is very much a theoretical concern. > > > > > > > > > > > > Regards, Martin. > > > > > > > > > > > >
Received on Saturday, 8 May 2004 20:44:56 UTC