W3C home > Mailing lists > Public > www-international@w3.org > January to March 1997

Re: Text that's not in any languagea

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Thu, 9 Jan 1997 15:07:57 +0100 (MET)
To: Bert Bos <bert@www10.w3.org>
cc: www-international@www10.w3.org
Message-ID: <Pine.SUN.3.95.970109145654.245D-100000@enoshima>
On Wed, 8 Jan 1997, Bert Bos wrote:

> RFC 2070 (html-i18n) says that the LANG attribute is only for natural
> languages, not for computer languages, but recently I've started
> wondering why.

It's taken from RFC 1766. Loosening that restriction might
open a can of worms. There is not even that much of experience
with the current language tags yet.

> It may happen in a text that there is a word or phrase that is not in
> any human language, such as the name of somebody, or some code.

Names are is some language. For high-quality rendering of Han
names, it's good to know whether it's Chinese or Japanese or
Korean or Vietnamese. For text-to-speach rendering, it's
also important. Take your full first name as an example :-).
If you don't tag that as Dutch, the pronounciation will be
very far from the real one.

> HTML has some mark-up for the computer code: it can be put inside
> <CODE>, but there is no element for the name of a person.
> Maybe LANG should be extended to cover
>   - computer languages (Pascal, C, HTML, CSS,...)
>   - proper names (language "none"?)
>   - "unknown" and "any" languages
> The last two would be useful, resp., for a text that is in some
> language, but the author doesn't know which, and for a text that is the
> same in every language. An example would be the SI units mm, s, etc.

The last one might be useful, but I am not really convinced. For
units, there may be language-dependent rendering conventions.
The example of Hebrew or Arabic is not very relevant, because in
these cases, the script is decisive. The question would be whether
"mm" in Hebrew letters can be thought, without bad consequences,
as Hebrew or Yiddish or so, or whether it is necessary to tag
it as something neutral. Same for Arabic and the many languages
it is used with. In that case, I guess at least for Urdu, it would
be important to tag it as Urdu so that a "falling" font style is

Regards,	Martin.
Received on Thursday, 9 January 1997 09:07:52 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:16 UTC