Re: Natural language marking in HTML

Two comments on the draft titled "Natural language marking in HTML", 
at <http://www.crpht.lu/~carrasco/winter>:

> 3. Language(s) of a document
>
> This is defined in a similar way to traditional publication on paper: 
>
>   Monolingual document 
>     When the bulk of the document is in one language. 
>
>   Multilingual document 
>     When the bulk of the document is not in one language. For example, a 
>     bilingual French and English document. 
>
> 4. Behaviour
>
> <HTML LANG=xx> indicates the language for the whole document with an ISO-639 
> two characters language code. This is a declaration that the document is 
> monolingual.

Comment 1:

The value of the LANG attribute is *not* defined to be an ISO 639 code. 
RFC 2070 uses the language tag scheme defined by RFC 1766.  ISO 639 is just 
one element of this scheme.  All the following examples use more than ISO 639 
and are legal RFC 1766 language tags:
   zh-cn
   no-nyn
   en-cockney
   x-klingon

Comment 2:

I do not agree with the proposition that the presence of <HTML LANG=...> 
should be taken to mean that the document is monolingual.  For an example, 
see <http://www.reuters.com/unicode/iuc10/x-utf8.html>.  This document is 
far from monolingual - it contains the same text in twenty nine languages.
As the document has an English title, a brief English introduction, a few 
English images and ends with English trademark statements, we have used 
<HTML LANG=en>, and have then tagged the elements containing the various 
texts with the languages of those texts.  To us this indicates that the 
individual texts are embedded within an English page, even though it is 
not the case that "... the bulk of the document is in one language.".

Misha

Received on Friday, 7 March 1997 18:32:17 UTC