W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > January to April 1995

Re: charset parameter (long)

From: <yergeau@alis.ca>
Date: Wed, 11 Jan 95 11:26:59 EST
Message-Id: <9500117898.AA789845204@smtplink.alis.ca>
To: www-mling@square.ntt.jp
Cc: html-wg@oclc.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Bob Jung <bobj@mcom.com> writes:
>I agree with Larry Masinter <masinter@parc.xerox.com> 
>that we should replace the
>
>        Accept-charset=xxx
>with the
>        accept-parameter charset=xxx

I'm not sure this is really better.  Have we lost the preference (q) parameter? 
Furthermore, as Daniel Connolly wrote, the bandwidth necessary to express the 
common cases should be minimized and this seems to be a move in the wrong 
direction.

>I agree that if the charset parameter is not specified, the default 
>***should*** be US-ASCII (or ISO8859-1, if it's been changed).

Changed where?  In MIME or in HTTP?

>Unfortunately, since charset was reserved for future use, Japanese servers 
>had no choice but to serve non-Latin files without a charset parameter!!
>
>Why don't we enforce the default for servers using a future version of 
>the HTTP protocol? ...and let current versions be "implementation 
>dependent" in order to preserve backwards compatibility?

I agree that it should stay "implementation dependent" for current 
servers, but I'm just not sure it is necessary or wise to *enforce* a 
default in the future.  What's the point, except for sticking to the 
letter of MIME, which has to live with the limitations of mail?  Does that 
mean that, say, Mosaic-L10N would have to forgo its (limited, but still 
useful) automatic codeset detection?

>Daniel>...but I think the overall proposal is incomplete
>Daniel>until we have a workable proposal for how to enhance the major httpd 
>Daniel>implementations to correctly label non-ISO8559-1 documents.
>
> [...]
>
>A web site with versions of the same files in different encodings
>(e.g., SJIS, EUC and JIS) or languages (e.g., English and Japanese) 
>could create separately rooted trees with the equivalent files in each 
>tree. The top page could say click here for SJIS/EUC/JIS or 
>English/Japanese.

And how is the top page encoded?  In what language?  This just shows that 
Accept-Language and Accept-Charset are needed,

>One idea, is for a HTML <charset> tag that would take precedence over the 
>MIME header:
>
>   <html>
>   <charset=xxx>
>   <head> <title> DOCUMENT TITLE GOES HERE </title> </head>
>   <body>
>   <h1> MAJOR HEADING GOES HERE  </h1>
>
>        THE REST OF THE DOCUMENT GOES HERE
>
>   </body>
>   </html>

I like this, but will it fly?  What about multi-charset documents?

>Daniel>But "giving somethign the client didn't ask for" is _not_ an HTTP 
>Daniel>protocol viloation (at least not if you ask me; the ink still isn't 
>Daniel>dry on the HTTP 1.0 RFC though...). It's something that the client 
>Daniel>should be prepared for.
>
>As you put it "It's something that the client should be prepared for."
>
>I'm still assuming that
>
>        accept-parameter: charset=xxx
>
>dictates if the server sends back the charset parameter.  An old browser 
>should continue to get the 2022-JP data untagged.  A new charset-enabled 
>browser should get tagged 2022-JP data even if it only advertised 
>8859-1.

What's the point, if it just advertised that it couldn't deal with 
2022-JP?  It seems to me that a "new" server (able to tag) should send a 
more economical error reply, leaving the browser with the alternative of 
1) abandonning (can't read anyway) or 2) reformulating the request.  It 
could advise the user and ask for guidance.  Of course the browser must 
still be prepared to receive *untagged* whatever from an old server.

>I agree.  I've purposely read EUC and JIS pages on my Mac (SJIS), so that I 
>could save the source and look (grok) at it later. (not a usual user...)

Can still be done with 2) above.

>Ken>I want to add one more thing about this issue. We could have the document 
>Ken>which uses multiple charset in future. We must define the way to label 
>Ken>such a document.
>Ken>It can be like ...
>Ken>        Content-Type: text/html; charset="ISO2022-JP", charset="ISO8859-6" 
>Ken>Is this OK?
>
>I'd rather leave this as a possible future direction.  Multilingual has 
>had a lot of heated discussions.  If we can agree on a means to support 
>the existing mono-lingual mono-encoded Web data, that will allow us
>to create products to fill an immediate need.  Can we phrase something that 
>leaves this open and discuss this in another thread?

It is the purpose of this (www-mling) list, after all!

Regards,

-- 
Francois Yergeau  <yergeau@alis.ca>
Alis Technologies Inc.
+1 514 738-9171
Received on Wednesday, 11 January 1995 08:35:17 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:31:13 EDT