W3C home > Mailing lists > Public > www-html@w3.org > September 2003

Re: Problem with LANG keyword

From: David Woolley <david@djwhome.demon.co.uk>
Date: Wed, 24 Sep 2003 22:23:39 +0100 (BST)
Message-Id: <200309242123.h8OLNdP14143@djwhome.demon.co.uk>
To: www-html@w3.org

> would be nice to see this even for languages such as Chinese: simplified and

I don't see how your points relate.  Taking them in reverse order.

> problem I have seen is when using languages such as French with Chinese
> where the Unicode characters interfere with each other like letters in
> French displaying as Chinese characters.

In my experience, this only happens when you try to view an invalid 
European document in a browser configured to compensate for invalid
Chinese documents.  You are unlikely to have any more success in 
convincing authors to correctly use new technology than you currently
have in getting them to correctly use old technology.

All valid documents in modern forms of HTML specify the transfer character
set in real or meta HTTP headers.  Your problem arises when you make the
browser think that documents that have no character set are really in 
GB2312 and you encounter an ISO 8859/1 or Windows 1252 document that 
doesn't specify a character set.  (The correct default for slightly older
versions was ISO 8859/1, but there is now no default, presumably because 
a lot of non-Latin countries treated the default as being their favourite
character set, not the specified one, and even earlier browsers were not
character set aware and passed codes through to their font engines, 

The canonical document is in ISO 10646, so does not have an ambiguity.

> would be nice to see this even for languages such as Chinese: simplified and
> traditional as it has been an issue using both languages on one document,

The simplified/traditional split is a rather complicated issue.  Although
they are conventionally language tagged as though they were different
regional dialects, they are more like different fonts.  Unicode adds to
the confusion by using different code points for the different writing
styles for the same logical character (a few characters don't have
1:1 mappings), but shares the code points for charactes with the 
same structure.

Traditional characters are sometimes used in business signs in the PRC
(I can think of a case in Shanghai) in a similar way to the use of
gothic fonts in England, and my understanding is that formal calligraphy
is done strictly with traditional characters, even though zh-cn is
used a computer synonym for simplified ones.

Note that, in a properly language tagged document - using the zh-cn/zh-tw
convention - you can use CSS2 to select an appropriate font.
Received on Wednesday, 24 September 2003 18:12:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:05 UTC