W3C home > Mailing lists > Public > www-html-editor@w3.org > April to June 2003

Re: HTML Document - Multiple Charsets - Help appreciated.

From: Masayasu Ishikawa <mimasa@w3.org>
Date: Tue, 22 Apr 2003 14:49:53 +0900 (JST)
Message-Id: <20030422.144953.78702999.mimasa@w3.org>
To: James@virtual-aviation.fsnet.co.uk
Cc: www-html-editor@w3.org

"James London" <James@virtual-aviation.fsnet.co.uk> wrote:

> How can I display an HTML Page with text in more than a single CharSet??

Basically you can't.  I assume you are talking about multiple
*character encodings* in a single document - HTML's document
*character set* is always the Universal Character Set (UCS).

> I am aware that setting the character set for the entire page can be done using
> the META Tags, is there a way to do this at Tag level?? - i.e. apply a different
> Charset to an area of Text?

I don't quite understand why you need that.

> Imagine a page divided into two paragraphs of text, at the top would be one
> paragraph of text using Japanese "x-sjis"

"x-sjis" should not be used.  What is registered in the IANA character
sets registry [1] is "Shift_JIS", although there are several variants of
so-called Shift-JIS in practice.  You might want to have a look at
"5.3 Shift-JIS" of "XML Japanese Profile" [2].

> and another just underneath in
> Character set "iso-8859-9"??.

You don't have to use different character encodings to mix them.
Instead you should choose an appropriate character encoding that
covers sufficient repertoire of characters, such as UTF-8.
Alternatively you may represent some characters that cannot be
directly encoded in a given character encoding by numeric character
references or character entity references (if appropriate entities
are defined).  See "5 HTML Document Representation" of the HTML 4
specification [3] for details.

> I have found references to the charset attribute in the DIV and SPAN tags
> in various HTML reference manuals, however they do not display correctly.

No HTML Recommendation defined the charset attribute on 'div' and 'span'.

> I have also seen CHARSET referred to as LANG.

Those are completely different notion.  See "8.1 Specifying the language
of content: the lang attribute" of HTML 4 [4] for details about the lang

[1] http://www.iana.org/assignments/character-sets
[2] http://www.w3.org/TR/2000/NOTE-japanese-xml-20000414/#sjis
[3] http://www.w3.org/TR/html4/charset.html
[4] http://www.w3.org/TR/html4/struct/dirlang.html#h-8.1

Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium
Received on Tuesday, 22 April 2003 01:50:01 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:08:48 UTC