Re: Natural language marking in HTML

> M.T. Carrasco Benitez wrote:
> >There is a need to indicate monolingual docs. <HTML LANG=...> look like
> >the right place as the meaning is "if I do not indicate otherwise, the
> >text in this document is in language xx".  So, it should expect that the
> >bulk of the language be the one indicate in <HTML LANG...>.
>  
> Not the bulk, rather the base.

This a proposed change: it should be the bulk (and the base).  The
base for most of the doc could be change with <BODY LANG=xx>.

> For example, if I were to write an annotated
> version of the Treaty of Westphalia, it could well be that my page would be
> Hebrew, the annotations in Hebrew, while the bulk of the text in Latin. The
> LANG attribute on the HTML level would indicate Hebrew, while the lengthy
> quotations of the treaty would have a Latin LANG.

This is your choice as a author: you view the doc as Hebrew.  Another
author which mother tongue is also Hebrew could view it as Latin.

You could mark it as follows:

 <! --
 Jonathan declares his doc as monolingual Hebrew,
 even if the bulk is in Latin.  Not very common, but OK.
 The base language is Hebrew.
 -->
 <HTML LANG=iw>
 ...
 <!--
 Changes the base language to Latin because it is easier for marking
 -->
 <BODY LANG=la>
 ...

> The LANG attribute of the HTML tag only means what RFC 2070 says it means, and
> that cannot be changed.

It can be changed, if not for this case, at least in other cases when
needs arise.
  
> While we are at it, is there such a thing as a truly monolingual document in
> the international environment? For example, in this document, which is
> basically in English,

Most of the document are monolingual, even in an environment such as the
European Institutions with eleven official languages, though there are a
minority of mixed doc.

> there is a Spanish name (I hope I haven't offended anyone, it looks 
> Spanish).

I am from Sevilla.

> (It would have also contained a Hebrew name, had English not adopted it
> centuries ago). The Oxford dictionary contains quite a bit of French and
> other languages.
>
> Why is it so important to tell the difference between a monolingual English
> document and an English document with inserts in other languages? Ther is no
> historical basis for this distinction.

The languages of the documents are defined by examples to the printed 
documents.

When more than one language is assumed to exist, there is very much a need
for clasification and there is historical and current basis.  For example,
European official docs (Official Journal, etc) has an ISO-639 code printed
usually in the cover and they had a color classification: an strip with
"pinksh" for English, light blue for French, yellow for German, etc.

Tomas
<!-- Aramaic -->
<span lang=x-arc>Taomin</a>

Received on Sunday, 9 March 1997 06:00:34 UTC