[Prev][Next][Index][Thread]

Re: charset issues



We're not really in bad shape if everyone plays by these rules for
those documents that cannot be represented in latin 1:

a) EVERY client accepts UTF8. Any client may also accept other
charsets, or even _all_ charsets. There is no advantage, though, in
accepting 80%, though.

If a client knows all charsets, it just leaves out "accept-charset"
and takes what it gets. Otherwise, client sends a very short:

	Accept-charset: utf8, charset1, charset2

b) EVERY server knows how to send UTF8. The server may send whatever
the native encoding is, though, if either the client didn't set
accept-charset, or if the client included the native encoding in the
accept-charset.

If the document can be represented in latin 1, the 'accept-charset' is
just ignored, and the document is sent.

How can a client 'accept all charsets'? Well, let's make sure that
'all charsets' isn't an infinite set. We should limit charsets to
those that are registered with IANA, we should make sure there's some
kind of well-known transliteration service/table/applet that can be
dynamically downloaded for charset-to-UTF8 or charset-to-font for new
ones.

Larry


References: