W3C home > Mailing lists > Public > www-international@w3.org > April to June 2008

Re: 2 many language tags for Norwegian

From: Leif Halvard Silli <lhs@malform.no>
Date: Wed, 30 Apr 2008 03:47:54 +0200
Message-ID: <4817CFCA.5000802@malform.no>
To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
CC: www-international@w3.org

Frank Ellermann 2008-04-28 12.04:
> Christophe Strobbe wrote:
>  
> > Language information is also used by speech synthesis software
> > (e.g. screen readers), so whatever mechanism you use must be
> > unambiguous for this type of software.
>
> Yes, but maybe "must make sense" can replace "unambiguous" for
> Leif's idea of allowing "no" and "nn" (or "no" and "nb") in a
> document.  I'm not convinced that this idea is good, but it is
> an interesting problem.
>   

This idea of mine was more a side effect of the HTTP tagging issue. ;-) 
But I agree that being able to tag an element with several language tags 
instead of just tagging it with a tag that says "mul"(tiple) would be an 
improvement.

> >> They already allow "less than one tag", killing NMTOKEN
> >> for the desired xml:lang="" effect also in (X)HTML.
>  
> > I'm not aware of any speech synthesis software used by
> > persons with disabilities that supports xml:lang.
>
> That's not what I meant.  HTML 4 was limited to NMTOKEN, that
> is precisely one non-empty language tag.  XML later allowed
> xml:lang="" to indicate any "undefined language" situations
> within a document.  
>
> But XHTML 1 was forced to stick to NMTOKEN, because it tries
> to mirror HTML 4 as good as possible.  When HTML 4 drops this
> limitation, then lang="" will be allowed in its HTML flavour
> (in the XML flavour of HTML 5 it would be again xml:lang="").
>   

I think there could be two approaches if we wanted to be able to specify 
multiple languages.

    EITHER, one could use the "mul" attribute, and basically add the 
spesific languages after the "mul" keyword - "mul-en-fr".  OR, one could 
define a new syntax where a space character or a comma is allowed 
between each language. This partly works as long as the first language 
include the "-" character.

     Using "mul", it would have to be forbidden to use e.g. geographical 
sub tags. Thus lang="mul-en-US-fr-FR" would be forbidden. One would have 
to write lang="mul-en-fr". I think this is a very logical and small 
disadvantage. If "en-US" was important, then this could be inherited 
from the mother element.

    As for a new syntax, I found e.g.e <div lang="fr- en"> and <div 
lang="fr-FR en">  are selectable via the CSS selector div:lang(fr){} at 
least in both Firefox, Opera, Safari and Internet Explorer for Mac. (The 
key is that the first langauge must containe a "-" in order to be 
compatibe with current browsers.) But something like div:lang(fr en){} 
does not work anywhere.

    Comparing with mul, it would be no problem having the selector 
:lang(mul-fr-en). And if screenreaders have to learn to cope with the 
mul tag anyhow, then that supports using the mul tag as well. Thus, 
using "mul" would be most simple. It also won't break current CSS 
selectors allready using :lang(mul){}.

Comments? Are then any big issues to the special language tagging method 
that "mul" requires?
-- 
leif halvard silli
Received on Wednesday, 30 April 2008 01:48:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:17 GMT