Re: form submission and chars. outside the reper. of ...

On Fri, 19 Sep 2003, Ian Hickson wrote:

> On Thu, 18 Sep 2003, Martin Duerst wrote:

> will say). Personally I prefer to replace out-of-set characters with "?".
> Some UAs, namely Mozilla (in all such cases) and IE (in a more limited set
> of cases) currently replace unknown characters with the string "&#", the
> decimal representation of the character's Unicode code point, and ";".
> Now, this is not really wise, as has already been discussed in this
> thread,

  I agree that it's not wise, but some server-side programs have sorta
'relied' on that behavior making things complicated ...

> and I believe the relevant Mozilla folk are willing to change
> this to be interoperable with whatever we officially decide on.

> Proposal:
>
>    If the form data set contains characters that are outside the
>    acceptable submission character sets, the user agent SHOULD inform
>    the user that his submission will be changed, for example using a
>    dialog in the form:
>       ____________________________________________________
>      || Warning |||||||||||||||||||||||||||||||||||||||||||
>      |                                                    |
>      | This form cannot handle some of the characters you |
>      | have entered. The data will be sent as "D?rst".    |
>      |                                                    |
>      |              (( Send anyway ))  ( Return to form ) |
>      `----------------------------------------------------'
>
>    If the submission is not cancelled, the user agent MUST replace
>    each character that is not in the submission character set with a
>    single replacement character, either U+FFFD, "?", or some other

  The above is just fine for average Mom and Pop, but wouldn't it
be nice to those who know their ways around to be a bit more verbose
by adding some _parenthetical_ note that reads

    You might transfer all characters intact by changing the character
    encoding of the page to one with the widest repertoire of characters
    supported by the form such as UTF-8 before you fill it out.

Right after writting the above, I realized that  it only works with a
small set of forms that are 'character-encoding-neutral'. So, it's not
so useful, I guess.

>    character depending on the availability of characters in the
>    submission character set.

   Were you alluding to a possible transliteration or just a different
question-mark-like character?

> I would recommend inserting this into HTML 4.01 section 17.13.3, or
> into an XHTML forms module if we want to be forward-looking instead.

> It _should_ be addressed by standards, since garbage is a very common
> input on the Web. (e.g. if the HTML spec had stated how parsers should
> handle invalid HTML from the start, then we would have largely avoided

  Needless to say, it'd have been still better if it had had a built-in
mechanism for character encoding specification from the very beginning
(even for GET).

  Jungshik

Received on Friday, 19 September 2003 06:57:23 UTC