- From: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
- Date: Wed, 24 Sep 2003 10:13:43 +0200
- To: "David Woolley" <david@djwhome.demon.co.uk>, <www-html@w3.org>
Hello David, I am not saying not to use language information. What I am saying is to allow more than one to be active at the same time. If it was possible to write: <HTML LANG="HE,AR,EN"> then when you get to Hebrew characters you know exactly that we are speaking about Hebrew and not Yiddish, Ladino or Aramic. If you get to Arabic characters you know exactly that we are speaking about Arabic and not Turkish and if you get to Latin you know you are using English and not Dutch. The problem is that the LANG keyword (according to W3C standard) is not allowed to receive more than one value. This is why I need to use: <META http-equiv="Content-Language" CONTENT="HE,EN"> Thank you, Reuven Nisser Ofek Liyladenu -----Original Message----- From: www-html-request@w3.org [mailto:www-html-request@w3.org]On Behalf Of David Woolley Sent: Tuesday, September 23, 2003 10:52 PM To: www-html@w3.org Subject: Re: Problem with LANG keyword [ Can't find the original...] > Reuven Nisser <rnisser@ofek-liyladenu.org.il>: > > > > However, there are times where the change of language is "known" by the > > character set used in the HTML. For example, English is using Ansi 7 bit Leaving aside the obvious confusion between the HTML character set and the ones that might be used to transfer pages to the browser (the former is ISO 10646, slightly subsetted) and the bogus "Ansi" set, except to note that a page may legitimately be converted between transfer character sets, using numeric entities to fill any gaps.... > > characters but Hebrew & Arabic occupy the upper 128-255. [...] They are actually well above 255. However, more importantly, Hebrew characters could be Yiddish or Ladino, and, as it's derived from the Aramaic script, might be used for that as well. Arabic script is used for many languages, including Farsi (Persian), Urdu, Bengali, Pushtu, Malay, and others. (On the other hand, en-gb is likely to contain ISO 10646 code point 163.) Where people are using fixed length, 8 bit character sets which are supersets of ISO 646 to transfer documents (true of most current 8 bit sets except EBCDIC, and basically the same rules as those under which meta...charset works), using language codes in the document also avoids the need to know the details of lots of possible character sets, which will help search engines to index by language without any deep understanding.
Received on Wednesday, 24 September 2003 03:11:59 UTC