Re: Charsets revisited from Gavin Nicol on 1996-01-25 (ietf-http-wg@w3.org from January to March 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Thu, 25 Jan 1996 09:32:53 -0500
To: masinter@parc.xerox.com
Cc: glenn@stonehand.com, frystyk@w3.org, nms@nns.ru, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199601251432.JAA00560@ebt-inc.ebt.com>

>This specification calls for the _characters_ of the form results to
>be encoded in a URL. However, the URL encoding (specified in section
>2.2 of RFC 1738 (URL)) is a way of encoding octets, not a way of
>encoding characters.
> 
>It is this disconnect that leaves the ambiguity that we're worried
>about here: when a user fills out a form and the values in that form
>are transmitted, what is the character set used in the transmission.
> 
>As such, I think this issue must be addressed in the HTML working
>group as a technical review issue for RFC 1866. As we've discussed in
>numerous other venues, there is no easy solution to the problem in
>general, although RFC 1867 (file-upload) gives some relief in many
>instances.

Given the syntax I posted earlier is still valid, it seems to be that
the best thing the HTML working group could do would be to recommend
that *all* form data be sent as a message body. This solves all the
problems *except* the problem of URI's pointing to resources that are
named in something other than ISO-8859-1 (ie. a file called
"insatsu.html" on a Japanese Windows NT machine). I have seen such
URL's, though I have not recorded them. Many people in Japan think
that it's a rather silly thing to do, but they all also acknowledge
that it will become increasingly common.

Received on Thursday, 25 January 1996 06:36:14 UTC