- From: Bob Jung <bobj@mcom.com>
- Date: Thu, 5 Jan 1995 21:10:11 -0800
- To: www-mling@square.ntt.jp, html-wg@oclc.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
- Cc: www@unicode.org
Hi, The pragmatic proposal below is driven by a need to meet an existing (and pressing) need and a desire not to a derail long-term multilingual solutions. My assumptions: o There's lot of Japanese Web pages in ISO2022-JP that we need to be able to browse today. o Lots more non-Latin1 text files will be (or already are) created on the Web. o Changes must not require changes of Web file contents. o Unicode in one form or another will be used for future Web pages. o New clients should not break existing servers and new servers should not break existing clients (backwards-compatibility). Comments, please! -bob ============================================================================ "Accept-Charset" and "charset" Support for Web Browsers In order to render single language (actually single character set encoding) text files on the Web correctly, a mechanism is needed to identify the character set encoding per text file. For example, files encoded in ISO2022-JP should be rendered as Japanese and files encoded in ISO8859-1 should be rendered as Latin characters. Currently, there is no deterministic way to know the character set encoding of a text file. The MIME content type header provides a mechanism for this by means of the "charset=xxx" parameter. For example: Content-Type: text/plain; charset=ISO_8859-1:1987 or Content-Type: text/html; charset=ISO-2022-JP The problem is that many browsers today do not parse for parameters and will be confused by the above examples. Some browsers will take the entire string "text/plain; charset=ISO_8859-1:1987" instead of "text/plain" as the content type. Therefore, I suggest that charset-parameter-savvy browsers, send servers a new accept header, "Accept-Charset". This would look like: Accept-Charset: ISO_8859-1:1987 Accept-Charset: ISO-2022-JP The "Accept-charset" header was proposed by Gavin Nichol in a document sent to the several mailing lists (html.wg, http.wg, www.unicode), "Handling Multilingual Documents in the WWW". See http://www10.w3.org/hypertext/WWW/Administration/Mailing/Outside_mailing.htm l I propose that servers only send the MIME charset parameter if it has received an "Accept-Charset" from the browser. This convention will prevent compatibility problems with current browsers. The charset-parameter-savvy browsers should send "Accept-Charset" headers for the charsets they recognize. The "Accept-Charset" header should NOT restrict servers from sending text files in other charsets. It is the browsers' responsibility to handle unsupported charsets gracefully. If the browser receives text files without charset information, then the behavior will be implementation dependent. In this case, I suggest that the browser use a per-window default. This allows knowledgeable users to read Japanese newsgroups in one browser window and French newsgroups in another one, even if the charset is not specified in the headers. Ideally, all text files will provide charset headers, but the per-window default would provide users with a means to deal with unidentified text data. It is the browsers responsibility to know how to render the text file correctly. This may require converting from the character set encoding of the file to another internal character set encoding. Whether this internal encoding is Unicode or some other encoding is implementation dependent. In the future, there may be HTML tags that specifies character set encoding at a finer granularity (i.e., per-string vs. per URL). These HTML tags may be required to implement multilingual HTML documents. When (or if) these tags exist, they would take precedence over the MIME charset header information. However, the MIME charset header information will remain useful for new and (especially) existing "single" language documents. The MIME charset information will allow existing documents to be rendered correctly without modifying their contents by adding new HTML tags.
Received on Thursday, 5 January 1995 21:12:27 UTC