W3C home > Mailing lists > Public > www-html@w3.org > September 2003

Re: Problem with LANG keyword

From: David Woolley <david@djwhome.demon.co.uk>
Date: Wed, 24 Sep 2003 22:37:14 +0100 (BST)
Message-Id: <200309242137.h8OLbFn14166@djwhome.demon.co.uk>
To: www-html@w3.org

> 
> 
> The problem is the overhead necessary to add the LANG tag in all

LANG attribute.

> This is because English is an official language in Israel, many people
* do not like to translate expressions from English to Hebrew because
* everyone knows the English words and not the Hebrew translation.

In that case, they will probably get the status of loanwords and therefore
be Hebrew words that are very similar to the English words.  In that
case, they should be correctly language coded as lang="he", although 
one might need lang="en-il" if you needed to heavily hint a text to
speech system.  If people then chose to write loanwords in a loan script
your automatic language detection breaks down.

In English, loanwords in common use diverge in pronunciation from 
that in the language from which they came (some people argue that 
there are really no non-loanwords in English!), so specifying 
lang=fr for "nom de plume" will produce the wrong pronunciation,
and no-one would understand if you specified Bengali as the language
for bungalow (a one story house - the word comes from Bengali, which 
comes from bangla (both the word and the language for Bengali and bangla!)).

> 
> Another reason is that most sites are trying to be bi-lingual so that
* both English and Hebrew readers will find their way.

I'd expect sites that are bilingual to have substantial blocks of
text in one language before switching.

> 
> So, if W3C recommendations will include the need to add LANG attribute
* whenever you use English instead of Hebrew, we will get a very strong
* negative response. People will say (and they are right) that the Latin

There is already a very strong negative to the five year old standards
on this.  The only language likely to appear in most documents is that
dumped by the authoring tool into a meta element (possibly dumped from
a Word document) and probably wrong if the language is not English.
Received on Wednesday, 24 September 2003 18:12:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:58 GMT