W3C home > Mailing lists > Public > www-i18n-comments@w3.org > November 2004

RE: Feedback on Authoring Techniques for XHTML & HTML Internationalization

From: M.T. Carrasco Benitez <mtcarrascob@yahoo.com>
Date: Fri, 12 Nov 2004 10:56:54 +0000 (GMT)
Message-ID: <20041112105654.50750.qmail@web42004.mail.yahoo.com>
To: Richard Ishida <ishida@w3.org>, www-i18n-comments@w3.org
Cc: GEO <public-i18n-geo@w3.org>

Richard, Martin,

>Firstly, let me note that our document does not attempt to redefine
>The technology is used or applied outside the parameters of what is 
>possible given the current situation.  GEO's remit is to give advice
>content authors about how to apply what's currently available.  If you
>want to make substantive changes to the way language information is
>declared and used in W3C technology, you should raise those requests
>separately, and they will be dealt with by the I18N Core Task Force
>(send a note to Martin, who  chairs that TF).

The intention is to stay within the current situation. The proposal is
to specify recommendations for the grey (or undefined) aspects.

But to have an efficient method to specify the language(s). It has to
combine both aspects:

 - Formal
    - Informatics/mathematical coherence
    - Long term evolution of all the standards
    - Language as metadata
    - Language as text processing

 - Practical
    - "To specify the language just put this string here"

A misplaced desire for unnecessary formalism carries a very large
penalty cost.

For the European Institutions, this is very important due to the large
number of documents in many languages.

>I do not believe it is appropriate to advise declaration of 'primary
>language' (by which I mean the metadata about the document as a whole)

I also prefer the term "metadata language(s) of the document". I used
the term "primary language(s)" to respect the current terminology.

>by allowing the declaration in either of the three places you note
>for the following reasons:

>[1] The meta statement is not defined by the HTML specification, so

The meta statement is defined in the HTML specification

>relationship to <html lang=, is unclear. So implementing this approach

>would require changes to the specification.

Agreed. The relationship is unclear and it should make it clear.

>[2] Allowing two possible locations for in-document declarations will
>decrease clarity about the best way to declare this information, make 
>it harder for applications to resolve such information, and increase
>the risk of errors.  (In some ways this approaches the lang vs
>situation you are so unhappy with.)

Agreed. The feeback document


   - The primary language(s) must be specified only once.
   - There must be one preferred recommendation for each case.
   - There could be other secondary recommendations."

>[3] Imposing possible double-duty on the <html lang= declaration is
>good practise.  It may not be generally recommended, but it is
>that an author wants to declare a different primary language and
>text processing language,

It is more than resonable to assume the primary language is the default
processing language. If the author wants to overwrite the processing
language he can do it lower in the tree or part of the tree. 

> eg. the document is targeted at readers of
>Hindi, but most of the navigational text is in English. In such a
>the precedence of the <html lang= declaration would obviate the
>declarations elsewhere.

Please could you elaborate.

>[4] An <html lang= declaration cannot be made to represent more than 
>one language, but has higher precedence than the other declarations.
>This makes it impossible to declare a document with multiple primary
>languages and declare a language in the html tag (which may sometimes
>be reasonable).

Agreed. This is the position in the feedback document referred above
and the additional clarification post:


The proposed precedence is:

  "protocol (including the procotol "file"; i.e., the filename)
    <meta http-equiv ... /> (several languages possible)
     <html lang= ... >"

>Note that empty titles are strongly discouraged by the W3C.  Titles
>are used for more than just display in the browser - eg. bookmarks,
>indices, search info, etc.

I do not like empty title. I just identify the problem and make
proposals within the current situation. As stated in the feedback
document, in n-lingual documents (in the current situation) is not
possible to identify the languages in the text processing language
in "title".  Hence the options are:

 - Language neutral string
 - Undentified languages
 - Empty

>I think the 'opposite' (ie. assuming that the text processing
>language is the same as the primary language metadata) can make

One has to agree in the precedence: at the top of the tree is
the "metadata language(s) of the document". I change the term
on purpose because it seems that using "primary language(s)" 
is creating problems.

The "metadata language of the document" (singular) is the text
processing language. Lower in the tree it can be changed. 

To do this, one does not need to have double declaration of

 <html lang="en" xml:lang="en">

>The HTML specification says that the
>HTTP information can be used for text processing lang decl.  We,
>however, advise strong caution against this because we feel it is
>better to express information about text-processing in-document 
>due to the possible need to read the file away from a server,
>potential risks in managing data on the server, and the
>difficulties for many people to actually make changes to their
>server setup.

As you state above, my proposal is well within the "current

> All in all, I think if you want a truly simple
>approach, you should keep primary language and text processing
>language declarations separate.

Though some parties might need the conceptual differentiation of
"primary language" and "text processing", many do not need this
differentiation and indeed, it creates confution.

In practice, most documents are monolingual and the main need
is to have a single specification of "the" language of the
document. One just needs to agree where. For many this is



 - Formally: the declaration is at "protocol" level; i.e., the
             highest point in the tree.

 - Practically: it is very easy.

 - Real situation in server:
    - it works Apache (Content-Language: en)

 - Real situation out server: The information is there.

>xml:lang is not needed when an XHTML 1.0 document is read by a
>browser as text/html, but is needed if you process that file as
>the XML it really is.

So for HTML and XHTML (as HTML) lang is sufficient and xml:lang
is unnecessary.

Now for XHTML as XML, to be consistent with 
 <html lang="en" xml:lang="en">

one would have to have lower
 <p lang="en" xml:lang="en">

and it would be a no ending story with the other attributes such as

Hence, the proposition should be that for XHTML (as HTML or XML)
only to use lang.

>Actually a lot breaks if you use lang when treating the text as XML.
>xml:lang is provided as a standard way to declare language in XML 
>documents, and its use as such should be promoted wherever possible.
>It carries with it some additional benefits that other attributes do
>not have, such as an expectation that its value is inherited by
>lower-level elements in the infoset, and the ability to express
>'no language' in a standard way using xml:lang="". As an example
>of standardisation, applications such as XSLT are set up to
>automatically look for information in xml:lang for use of the
>lang() function.

As stated in the feedback document, in XML it should be xml:lang.

>This may need clarification, but the intent is definitely
>"The possible alternative values of the attribute...".
>Otherwise, >the attribute would allow declarations that would
>be inappropriate for its use in declaring the text processing

Hence in XML we have to clarify
 <doc xml:lang="en,es">

though, I do not see any problem.


Moving house? Beach bar in Thailand? New Wardrobe? Win 10k with Yahoo! Mail to make your dream a reality. 
Get Yahoo! Mail http://uk.mail.yahoo.com
Received on Friday, 12 November 2004 10:57:25 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:00 UTC