RE: Language Identifier List up for comments from Peter Constable on 2004-12-14 (www-international@w3.org from October to December 2004)

From: Peter Constable <petercon@microsoft.com>
Date: Tue, 14 Dec 2004 14:22:38 -0800
To: <www-international@w3.org>
Message-ID: <F8ACB1B494D9734783AAB114D0CE68FE04856CAD@RED-MSG-52.redmond.corp.microsoft.com>

> From: www-international-request@w3.org
[mailto:www-international-request@w3.org]
> On Behalf Of Elizabeth J. Pyatt

> Previously, the language codes have been used to encode both script
> and  language. I was assuming the characters embedded would convey
> which script is being used.

ISO 639 language IDs have never been used to identify script
distinctions. It is true that some users or implementations of RFC 1766
or RFC 3066 have sometimes used region to infer script distinctions
(e.g. zh-TW for Traditional Chinese), but this is not recommended.

It is not sufficient to supposed that characters contained in text
content can be relied upon for identifying the writing system for text:
that does not provide any means to specify in a query what writing
system is desired.



Peter Constable
Microsoft Corporation

Received on Tuesday, 14 December 2004 22:22:52 UTC