W3C home > Mailing lists > Public > www-html-editor@w3.org > July to September 2003

RE: Problem with LANG keyword

From: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
Date: Tue, 23 Sep 2003 14:35:43 +0200
To: "Masayasu Ishikawa" <mimasa@w3.org>
Cc: <www-html-editor@w3.org>, "Gertel Hasson" <gilagh@netvision.net.il>, "Shaula Haitner" <shaula@shaula.co.il>, "Yuval Rabinovich" <yuval@faz.co.il>
Message-ID: <EOEHIKCGOKGNIEEKJHEKEEGDDCAA.rnisser@ofek-liyladenu.org.il>

Thank you for the explanation.
However, there are times where the change of language is "known" by the
character set used in the HTML. For example, English is using Ansi 7 bit
characters but Hebrew & Arabic occupy the upper 128-255. Another example
will be using Unicode or UTF-8 characters. In this case, the text language
can be derived for the text itself.

However, distinguishing between English and German for example is not
possible this way because both languages occupy the same character space
(Latin letters).

In the first example adding <HTML LANG="he,en-US"> is enough to allow
correct language identification even with mixed Hebrew & English text. So,
there is no need for language switching in the middle of the text.

However, if inside such a text I would like to switch to German, I will need
to use "LANG=ge".

Thank you,
Reuven Nisser
Ofek Liyladenu

-----Original Message-----
From: Masayasu Ishikawa [mailto:mimasa@w3.org]
Sent: Tuesday, September 23, 2003 12:22 PM
To: rnisser@ofek-liyladenu.org.il
Cc: www-html-editor@w3.org
Subject: Re: Problem with LANG keyword

"Reuven Nisser" <rnisser@ofek-liyladenu.org.il> wrote:

> So, I would expect both to be possible:
> 	<META http-equiv="Content-Language" CONTENT="he,en-US">
> 	<HTML LANG="he,en-US">
> However, according to http://www.w3.org/TR/NOTE-html-lan the first is
> possible. But according to
> http://www.w3.org/TR/REC-html40/struct/dirlang.html the second is not.

In HTTP, the Content-Language header field describes the natural
language(s) of the intended audience for the enclosed entity as a whole.
You MAY specify multiple languages, but that doesn't necessarily have
to be equivalent to all the languages used within the entity-body, and
you cannot specify which part is Hebrew and which part is English.

In HTML, the lang attribute specifies the base language of an element's
attribute values and text content.  You can only specify one language
code, but this can be used multiple times in a document, so if English
text is enclosed within Hebrew text, you may specify that change via
the lang attribute on an another element (e.g. span).

"Content-Language" and "lang" serve different purposes.

Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium
Received on Tuesday, 23 September 2003 07:35:32 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:39:42 UTC