- From: <lee@sq.com>
- Date: Sat, 8 Mar 97 04:47:08 EST
- To: unicode@unicode.org, www-international@w3.org
M.T. Carrasco Benitez <carrasco@innet.lu> wrote: > There is a need to indicate monolingual docs. <HTML LANG=...> look like > the right place as the meaning is "if I do not indicate otherwise, the > text in this document is in language xx". So, it should expect that the > bulk of the language be the one indicate in <HTML LANG...>. This seems reasonable to me... > For the document you mentioned, it would probably be better not to > indicate the language in the <HTML LANG...> and to mark the English like > the other languages as the doc is clearly multilingual. Is this a document with parallel translations? If so, footnotes may be in one language (say), or one lanaguage may be Old Church Slavonic and the other Old English, in which case you are probably right. But I would expect that HTML editing software would always by default put the author's editing locale's language in the HTML LANG attribute unless it was specifically overridden. It's hard for software to detect an author's intent. I think explicit rules are needed on what counts the majority language. Example rules might include [1] you can't understand enough of this to make sense unless you're fluent in Japanese and Old Frsian [2] 51% or more of the text characters in this document correspond to Hindi, so that's the majority language [3] 51% or more of the glyphs .... [4] 51% or more of the pixels set at 100 dpi... :-) If such rules are already in place, we can stop this discussion. If not, it seems they're needed. Number [2] seems easiest to compute automatically, and number [1] seems the most useful but can't be set automatically. Lee
Received on Saturday, 8 March 1997 04:47:08 UTC