CHaracter representation negotiation from Gavin Nicol on 1994-12-06 (ietf-http-wg@w3.org from October to December 1994)

From: Gavin Nicol <gtn@ebt.com>
Date: Mon, 5 Dec 1994 21:31:01 -0500
To: fielding@avron.ICS.UCI.EDU
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, html-wg@oclc.org
Message-Id: <199412060231.VAA09102@ebt-inc.ebt.com>

As Roy pointed out, if one wants to, one can negotiate for different
characer encodings for HTML with something like the following from a
client:

   Accept: text/html; charset=unicode_1_1_utf_7

However, very soon, we will be getting SGML aware browsers (and also
browsers for other document formats). Now we could have a charset=
on each of these different MIME types, but I think we need to get a
single HTTP field allocated for this. In addition, the following are
probably also needed.

1) Either UTF-7 or UTF-8, or both, strongly recommended by both the
   HTML and HTTP specs as the way to transmit multilingual documents.
2) A definition of "escape codes" to be used to indicate language and
   other such parameters to aid in display purposes. As I have said
   elsewhere, such tagging would probably happen automatically, and so
   not be visible to the end users.

I think we should look upon thse as "enabling technology". They will
not be immediately used (or at least not widely), but eventually, as
Unicode systems (browsers in particular) become available, they will
be increasingly important.

On top of this foundation, we can then build 2 libraries of great
utility: 

1) A library for converting between various characer ancodings, and
   the tagged UTF.
2) A library for handling font display using Unicode. This is not
   exceptionally difficult.

With these, multilingual browser become, while not trivial, at least
not much more difficult than roman only ones.

Received on Monday, 5 December 1994 18:30:06 UTC