W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > May to August 1995

Re: ISO/IEC 10646 as Document Character Set

From: Erik van der Poel <erik@netscape.com>
Date: Wed, 03 May 95 13:43:26 -0700
Message-Id: <199505032043.UAA15805@slice.mcom.com>
To: Multiple recipients of list <html-wg@oclc.org>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, erik@netscape.com
[Cc'ing HTTP working group too.]


>This is partly a server/HTTP issue, since that's where things are
>usually labelled (or not, or mislabelled, as the case may be), but for local 
>storage (CD-ROM, software help files, or whatever), it would be useful to
>have *some* denotation within the document of the character encoding used

Well, several members of the HTML WG have come up with a proposal that
could solve the CD-ROM problem: *.www or *.htt(p) or *.mim(e) (or
whatever).

I.e. a file called foo.www on a CD-ROM could look like:

	Content-Type: text/html; charset=iso-2022-jp

	<HTML>
	...
	</HTML>

Clients and even servers could look at this and do the right thing.
(Digression: this convention could be used in FTP and NFS, too.)


The HTTP spec already has some wording about charsets, but it seems
that hardly any servers out there are appending the charset parameter
to the content-type header.

It's easy and tempting to say "They're broken and must be fixed".

But there's the installed base and the interoperability currently being
enjoyed (yes, even in Japan).  A server administrator can't just add
the charset parameter without thinking about the consequences.  What if
there are browsers out there that currently display Japanese just fine,
but have problems when there is a charset parameter?

I.e. some browsers do the right thing when they see "text/html", but when
they see "text/html; charset=foo", they try to save the document to a file
instead of displaying it.  The end user would definitely think of this as
a step backwards.

Again, it's easy for this working group to say "Those servers/clients
are broken.  Fix them."

But can we Westerners really dictate what the Japanese should do with
their "corner" of the Internet???  Especially since the default is
iso-8859-1, which means that we are not impacted.


It might be a good idea to have clients tell servers that they are capable
of parsing the charset parameter.  This is similar to Dan's proposal
to have clients tell servers that they can do HTML 3.0 (tables, etc).

Please let me know what you all think.  Perhaps the Japanese should be
involved in this discussion?


Erik van der Poel
Received on Wednesday, 3 May 1995 13:45:36 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:31:20 EDT