W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Re: several messages

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 19 Sep 2003 09:18:17 +0000 (UTC)
To: Martin Duerst <duerst@w3.org>
Cc: Francois Yergeau <FYergeau@alis.com>, "kuro@sonic.net" <kuro@sonic.net>, Paul Deuter <PaulD@plumtree.com>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <Pine.LNX.4.58.0309190848430.13024@dhalsim.dreamhost.com>

On Thu, 18 Sep 2003, Martin Duerst wrote:
> --------
> Sorry, this form cannot handle some of the characters you
> just typed. The data will be sent as "D?rst".
>       [OK to send]   [Let me change it]
> --------

That is a very interesting idea. Thanks!

So this solves part of the problem. The remaining problem is to decide on
what the data should be sent as (i.e. what the second part of that dialog
will say). Personally I prefer to replace out-of-set characters with "?".
Some UAs, namely Mozilla (in all such cases) and IE (in a more limited set
of cases) currently replace unknown characters with the string "&#", the
decimal representation of the character's Unicode code point, and ";".
Now, this is not really wise, as has already been discussed in this
thread, and I believe the relevant Mozilla folk are willing to change
this to be interoperable with whatever we officially decide on.

Safari and Opera currently translate such characters to "?".

> If there were any concrete proposal, we could find a list. But there
> does not seem to be anything even close to a concrete proposal on the
> protocol side.


   If the form data set contains characters that are outside the
   acceptable submission character sets, the user agent SHOULD inform
   the user that his submission will be changed, for example using a
   dialog in the form:
     || Warning |||||||||||||||||||||||||||||||||||||||||||
     |                                                    |
     | This form cannot handle some of the characters you |
     | have entered. The data will be sent as "D?rst".    |
     |                                                    |
     |              (( Send anyway ))  ( Return to form ) |

   If the submission is not cancelled, the user agent MUST replace
   each character that is not in the submission character set with a
   single replacement character, either U+FFFD, "?", or some other
   character depending on the availability of characters in the
   submission character set.

I would recommend inserting this into HTML 4.01 section 17.13.3, or
into an XHTML forms module if we want to be forward-looking instead.

>> How does XForms address this in a way different than XHTML Forms?
> It requires (for the GET case) to use UTF-8. So it forces upgrade on
> the server side.

So it avoids the problem, it doesn't solve it. Fair enough.

>> Ideally, this would be a normative errata to some spec (e.g.
>> HTML4), so that all UAs still in active development could change to
>> be interoperable.
> If you want to be compatible with IE, the easiest way is probably
> just to do what IE does (but don't say I told you to do so).

I didn't say that we wanted to be compatible with IE. I said that UAs
in active development could change to be interoperable. What I meant
was "with each other". As I understand it from comments on
microsoft.com, IE is no longer in active development.

> This seems to be a typical garbage-in-garbage-out scenario. Such
> stuff is rarely addressed by standards, and is not guaranteed to
> work.

It _should_ be addressed by standards, since garbage is a very common
input on the Web. (e.g. if the HTML spec had stated how parsers should
handle invalid HTML from the start, then we would have largely avoided
the whole Tag Soup problem. This is why CSS' very clear parsing and
error handling rules are so important.)

>>> I don't think anyone expects Opera or Mozilla to be able to compensate
>>> for the limitations of legacy servers and legacy server side code.
>> Oh but they do. :-)
> How?

"This doesn't work. It should. Please to be fixing."

Ian Hickson                                      )\._.,--....,'``.    fL
U+1047E                                         /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 19 September 2003 05:18:18 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:23 UTC