- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 19 Sep 2003 09:18:17 +0000 (UTC)
- To: Martin Duerst <duerst@w3.org>
- Cc: Francois Yergeau <FYergeau@alis.com>, "kuro@sonic.net" <kuro@sonic.net>, Paul Deuter <PaulD@plumtree.com>, "www-international@w3.org" <www-international@w3.org>
On Thu, 18 Sep 2003, Martin Duerst wrote: > > -------- > Sorry, this form cannot handle some of the characters you > just typed. The data will be sent as "D?rst". > [OK to send] [Let me change it] > -------- That is a very interesting idea. Thanks! So this solves part of the problem. The remaining problem is to decide on what the data should be sent as (i.e. what the second part of that dialog will say). Personally I prefer to replace out-of-set characters with "?". Some UAs, namely Mozilla (in all such cases) and IE (in a more limited set of cases) currently replace unknown characters with the string "&#", the decimal representation of the character's Unicode code point, and ";". Now, this is not really wise, as has already been discussed in this thread, and I believe the relevant Mozilla folk are willing to change this to be interoperable with whatever we officially decide on. Safari and Opera currently translate such characters to "?". > If there were any concrete proposal, we could find a list. But there > does not seem to be anything even close to a concrete proposal on the > protocol side. Proposal: If the form data set contains characters that are outside the acceptable submission character sets, the user agent SHOULD inform the user that his submission will be changed, for example using a dialog in the form: ____________________________________________________ || Warning ||||||||||||||||||||||||||||||||||||||||||| | | | This form cannot handle some of the characters you | | have entered. The data will be sent as "D?rst". | | | | (( Send anyway )) ( Return to form ) | `----------------------------------------------------' If the submission is not cancelled, the user agent MUST replace each character that is not in the submission character set with a single replacement character, either U+FFFD, "?", or some other character depending on the availability of characters in the submission character set. I would recommend inserting this into HTML 4.01 section 17.13.3, or into an XHTML forms module if we want to be forward-looking instead. >> How does XForms address this in a way different than XHTML Forms? > > It requires (for the GET case) to use UTF-8. So it forces upgrade on > the server side. So it avoids the problem, it doesn't solve it. Fair enough. >> Ideally, this would be a normative errata to some spec (e.g. >> HTML4), so that all UAs still in active development could change to >> be interoperable. > > If you want to be compatible with IE, the easiest way is probably > just to do what IE does (but don't say I told you to do so). I didn't say that we wanted to be compatible with IE. I said that UAs in active development could change to be interoperable. What I meant was "with each other". As I understand it from comments on microsoft.com, IE is no longer in active development. > This seems to be a typical garbage-in-garbage-out scenario. Such > stuff is rarely addressed by standards, and is not guaranteed to > work. It _should_ be addressed by standards, since garbage is a very common input on the Web. (e.g. if the HTML spec had stated how parsers should handle invalid HTML from the start, then we would have largely avoided the whole Tag Soup problem. This is why CSS' very clear parsing and error handling rules are so important.) >>> I don't think anyone expects Opera or Mozilla to be able to compensate >>> for the limitations of legacy servers and legacy server side code. >> >> Oh but they do. :-) > > How? "This doesn't work. It should. Please to be fixing." -- Ian Hickson )\._.,--....,'``. fL U+1047E /, _.. \ _\ ;`._ ,. http://index.hixie.ch/ `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 19 September 2003 05:18:18 UTC