RE: charset from Rick den Haan on 2008-04-10 (public-html-comments@w3.org from April 2008)

From: Rick den Haan <rick.denhaan@gmail.com>
Date: Thu, 10 Apr 2008 10:37:54 +0200
To: "'Harley Rosnow'" <Harley.Rosnow@microsoft.com>, <public-html-comments@w3.org>
Message-ID: <47fdd1e6.0bbf5e0a.6be2.4743@mx.google.com>

response.

Harley Rosnow wrote:
> Servers that compose files together need make their encoding
> consistent in the rendered composite file.  The same holds true
> for composition which occurs on the client.

True, but what if you need to have, e.g. Russian, Hebrew and Chinese text in
the same document?

Would it perhaps be an option to modify the META tag to allow multiple
values, where the first value is used as default?

For example:

<html>
<head>
    <meta http-equiv="Content-Type" value="text/html;
charset=KOI-8,UTF-8,GB2312">
</head>
<body>
    <section id="russian_content" lang="ru">
        <!-- Since KOI-8 was first in the meta, no charset is required -->
    </section>
    <section id="hebrew_content" lang="ar" dir="rtl" charset="UTF-8">
        <!-- Some Hebrew text, rendered and decoded using the UTF-8 charset
-->
    </section>
    <section id="chinese_content" lang="zh" charset="GB2312">
        <!-- Some Chinese text, rendered and decoded using the GB2312
charset -->
    </section>
</body>
</html>

In this situation, browsers can:

(1) Parse the META tag
(2) If only one charset is given, decode the entire document using that
charset
(3) If multiple charsets were given, preload the given charsets
(3a) Scan the document for elements with a charset-attribute
(3b) When found, decode the contents of that element using the given
charset, and drop it into a buffer
(3c) Decode the rest of the document using the default (first) charset given
(3d) Insert the decoded contents from the buffers into their correct
positions in the document

I'm not a software developer, so I may be thinking too simply here, but I
wouldn't consider this too difficult to implement.  The buffers might be a
memory hog in low-end systems.  And of course, there's the matter of what to
do if someone uses this in combination with Ajaxy-goodness and loads, oh I
don't know, Korean content for example and adds that to the document, while
that charset isn't loaded.

Cheers,
Rick.

Received on Thursday, 10 April 2008 13:33:18 UTC