Re: Natural language marking in HTML

> The value of the LANG attribute is *not* defined to be an ISO 639 code. 
> RFC 2070 uses the language tag scheme defined by RFC 1766.  ISO 639 is just 
> one element of this scheme.  All the following examples use more than ISO 639 
> and are legal RFC 1766 language tags:
>    zh-cn
>    no-nyn
>    en-cockney
>    x-klingon

You are right.  I will correct it.

> I do not agree with the proposition that the presence of <HTML LANG=...> 
> should be taken to mean that the document is monolingual.  For an example, 
> see <http://www.reuters.com/unicode/iuc10/x-utf8.html>.  This document is 
> far from monolingual - it contains the same text in twenty nine languages.
> As the document has an English title, a brief English introduction, a few 
> English images and ends with English trademark statements, we have used 
> <HTML LANG=en>, and have then tagged the elements containing the various 
> texts with the languages of those texts.  To us this indicates that the 
> individual texts are embedded within an English page, even though it is 
> not the case that "... the bulk of the document is in one language.".

There is a need to indicate monolingual docs. <HTML LANG=...> look like
the right place as the meaning is "if I do not indicate otherwise, the
text in this document is in language xx".  So, it should expect that the
bulk of the language be the one indicate in <HTML LANG...>.

For the document you mentioned, it would probably be better not to
indicate the language in the <HTML LANG...> and to mark the English like
the other languages as the doc is clearly multilingual.

Tomas

Received on Saturday, 8 March 1997 04:14:28 UTC