W3C home > Mailing lists > Public > www-i18n-comments@w3.org > November 2004

RE: Feedback on Authoring Techniques for XHTML & HTML Internationalization

From: Richard Ishida <ishida@w3.org>
Date: Thu, 11 Nov 2004 14:28:37 -0000
To: "'Manuel Carrasco'" <mtcarrascob@yahoo.com>, <www-i18n-comments@w3.org>
Cc: "GEO" <public-i18n-geo@w3.org>
Message-Id: <20041111142837.202924F04E@homer.w3.org>

Some comments dealt with in previous mail. See additional comments inline...

> -----Original Message-----
> From: www-i18n-comments-request@w3.org 
> [mailto:www-i18n-comments-request@w3.org] On Behalf Of Manuel Carrasco
> Sent: 28 October 2004 21:55
> To: www-i18n-comments@w3.org
> Subject: RE: Feedback on Authoring Techniques for XHTML & 
> HTML Internationalization
> >[1] ...
> I agree with the two functions:
>  - Metadata
>  - Text
> Primary language should be the metadata and default text 
> language. One should try to simplify for people working 
> everyday with these documents
> >> [2] "The text in the title must be language
> neutral."
> > I'm not sure why, if there's only a single language.
> I agree. My statement in the document:
>  - One primary language:
>     + Title in this language.
>  - Several primary languages:
>     + Language neutral or empty.

Note that empty titles are strongly discouraged by the W3C.  Titles are used
for more than just display in the browser - eg. bookmarks, indices, search
info, etc.

>     + In all the primary languages.
> > [3] "meta element with the attribute http-equiv is
> proposed because it is the only mechanism".  Although one 
> could say that theoretically declaring in the meta element is 
> equivalent to declaring in the http header Content-Language, 
> that is not the case in practise.
> They are not the same. One has to separate:
>  - Declaration of the primary language(s).
>  - What the processors (e.g., servers, text
> processing)  do with the declaration.
> A document could have a declaration of a primary language in 
> http-equiv and the server could ignore it.
> Indeed, this is the most common case today. 
> > I find this statement, coupled with the following
> that "servers should include the primary language(s) in the 
> Content-Language field" confusing.  Those are two mechanisms. 
>  The meta is not created automatically.
> Recommendations should indicate what the different types of 
> processors should do with the primary language.
> >Note also that in practise non of the user agents we
> tested actually used the information in the meta element to 
> establish language - all of them used the declaration in the 
> html element, though.  A rule like this requires all user 
> agents to change their behaviour if it is to be successful.
> This the reason why one should accept the declaration in the 
> html element.
> >[4] Why should text processors consider the primary
> language the default text processing language?
> Because one is declaring a document to be "en", it is a 
> resonable to assume that the default language is "en".
> > If it becomes undefined when several are declared,
> this seems a poor strategy.
> In your proposal is the same: one has to identify the in text 
> stream what is the language.
> >[5] Your example of multiple language text marked up
> in <title> cannot be done currently because HTML will not 
> allow markup in that element.  I do not see that happening 
> until we get to XHTML 2.0.  So this is not workable for 
> existing HTML/XHTML documents.  That's a really big problem. 
> (Note, by the way, that the candidate for 'foo' is 'span'. 
> That's standard
> practise.)
> I agree: I am identifying the problem, but I could not 
> suggest a solution. I assumed that one could have span in 
> title but I checked (a few years back) and I noticed that it 
> was not permited.
> >Secondary proposal:
> >[6] Again, this seems to operate on the premise that
> there should be only one language declaration. I do not see 
> any justifications for this in your proposal.
> I agree with the two funtions metadata and text, but 
> syntactically one should make it as simple as possible. And 
> this is the justification for one language declaration. 
> Indeed, it is not needed to have two languge declarations. As 
> I commented above, it is resonable to assume that the 
> metadata language declaration is the default language. 
> Indeed, the opposite does not have sense.

I think the 'opposite' (ie. assuming that the text processing language is
the same as the primary language metadata) can make sense. In fact, that is
how things currently work, though not consistently implemented.  The HTML
specification says that the HTTP information can be used for text processing
lang decl.  We, however, advise strong caution against this because we feel
it is better to express information about text-processing in-document due to
the possible need to read the file away from a server, potential risks in
managing data on the server, and the difficulties for many people to
actually make changes to their server setup.  

All in all, I think if you want a truly simple approach, you should keep
primary language and text processing language declarations separate.

> >[6] "It is not proposed to use the xml:lang
> attribute."  There are good reasons for using both in hybrid 
> XHTML 1.0 documents - so you can read in user agents as HTML, 
> but process as XML. I do not want to debate the merits and 
> demerits of using XHTML served as text/html, but it is widely 
> done, and I do not see this as a practical requirement.  It 
> is irrelevant for HTML and for XHTML 1.1+ and XML.
> This the worse offender: there is not reason to use double 
> declaration. Having the attribute lang is sufficient. By the 
> way, nothing breaks is one has and attribute lang in XML.

Actually a lot breaks if you use lang when treating the text as XML.
xml:lang is provided as a standard way to declare language in XML documents,
and its use as such should be promoted wherever possible. It carries with it
some additional benefits that other attributes do not have, such as an
expectation that its value is inherited by lower-level elements in the
infoset, and the ability to express 'no language' in a standard way using
xml:lang="".  As an example of standardisation, applications such as XSLT
are set up to automatically look for information in xml:lang for use of the
lang() function.

xml:lang is not needed when an XHTML 1.0 document is read by a browser as
text/html, but is needed if you process that file as the XML it really is.

> >[7] Note that your proposal for multiple values for
> the xml:lang attribute is currently not supported by XML, and 
> is unlikely to be supported in the near future.  It is 
> therefore ruled out for a large amount of existing data.  
> (It's not clear from your proposal whether you are proposing 
> usage or changes to the XML standard with this document.  If 
> the latter, I don't see any convincing arguments to change in your
> document.)
> It is under a section of "more work is needed"; i.e., an 
> illustration of how thing could develop.
> I am not proposing a change to XML {it would be easier to 
> change the Bible -:) }. It seems that with the existing 
> standard one could have several values in the attribute 
> xml:lang. From section "2.12 Language Identification"
>  "The values of the attribute are language identifiers ..."
> "values" in plural. Nothing in the production rules.

This may need clarification, but the intent is definitely "The possible
alternative values of the attribute...".  Otherwise, the attribute would
allow declarations that would be inappropriate for its use in declaring the
text processing language.

> Neither well-formed or valid documents would break:
> the attribute xml:lang has to be declared in valid documents.
> This would have to be double checked. But if one considere 
> XML a syntactic layer, nothing has to change. 
> Regards
> Tomas
> Send instant messages to your online friends 
> http://uk.messenger.yahoo.com 
Received on Thursday, 11 November 2004 14:28:39 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:20:15 UTC