W3C home > Mailing lists > Public > www-style@w3.org > October 2003

Re: CSS2.1 :lang

From: Tex Texin <tex@i18nguy.com>
Date: Fri, 17 Oct 2003 06:51:06 -0400
Message-ID: <3F8FC99A.12AEC995@i18nguy.com>
To: Bert Bos <bert@w3.org>
Cc: www-style@w3.org, 'W3c I18n Group' <w3c-i18n-ig@w3.org>

On rereading Richard and your mails, I think we are all saying very similar
things.

1) The language attribute should be used with language codes that follow RFC
3066 or its successors.
2) The CSS spec should encourage and support that.
3) The behavior of :lang should be well defined with respect to (valid and)
invalid codes.

I was just noting that the definition of valid values for language attributes
was given earlier in the spec (5.8.1) and didn't need repeating in the context
of selection, esp. where selection is independent of most of the details of the
spec.

I also noted the current definition is wrong, since it isn't citing rfc 3066.

So I think that CSS should:

a) state in one place its expectations for values assigned to language
attributes (e.g. consistent with 3066 and successors) and b) that they are
intended only to represent language.

This addresses Richard's requirement. From my perspective having it in one
place means it is more likely to be correct.
So we could make the text stronger:

It is recommended that documents and protocols indicate language using codes
from RFC 3066 [RFC3066] or its successor, and by means of "xml:lang" attributes
in the case of XML-based documents [XML]. It is recommended that the HTML lang
attribute and the xml:lang attribute only be used for language identification
and for no other purpose.


c) define :lang matching separately.

I would modify yours as:

The pseudo-class ':lang(C)' matches if the element is in language C. That is,
whether there is a match is based solely on the string C being either equal to,
or a hyphen-separated substring of, the lang attribute's value, in the same way
as if performed by the '|=' operator.
The string C doesn't have to be a valid language name.

d) I would provide one example where the language being tested was more than a
simple 2 letter code.
For example, indicating that :lang(fr-FR) matches fr-FR, and not fr-CA, since
with all the examples being just 2 letter language, it makes one wonder if the
intent is to limit :lang to just the first portion of the language code.

hth

Bert Bos wrote:
> 
> Richard Ishida writes:
> > Bert Bos writes:
> > > Tex Texin writes:
> > > > For the purposes of matching, I wonder if it makes sense to
> > > reference
> > > > the RFCs at all. Isn't it really string matching based on strings
> > > > formatted with hyphen separators? Does any software verify that the
> > > > language tag contains appropriately registered codes or uses ISO
> > > > codes? Should it be an error, or perhaps the rule ignored, if a CSS
> > > > document specifies  :lang(k9) since k9 is not an offical
> > > language code
> > > > or a properly formatted private code.
> > >
> > > I like that suggestion: it removes a dependency.
> > >
> > > The definition of the "|=" operator is already generic. It
> > > only requires a UA to split a string value at every "-" and
> > > doesn't require the string to be a valid language. The
> > > ':lang()' refers to that definition and could be made generic
> > > as well, e.g.:
> > >
> > > Current text in 5.11.4:
> > >
> > >     The pseudo-class ':lang(C)' matches if the element is in language
> > >     C. Here C is a language code as specified in HTML 4.0 [HTML40] and
> > >     RFC 1766 [RFC1766]. It is matched the same way as for the '|='
> > >     operator.
> > >
> > > Proposed:
> > >
> > >     The pseudo-class ':lang(C)' matches if the element is in language
> > >     C. CSS doesn't define what are valid language names and the string
> > >     C doesn't have to be a valid language name in the source document.
> > >     It is matched the same way as for the '|=' operator.
> >
> >
> > I disagree with this proposed para.  I think you are throwing out the
> > baby with the bath water.
> >
> > I see the value of referring to RFC3066 is to ensure maximum
> > standardisation/interoperability in the way language codes are used.
> > For example, 3066 requires the use of 2-letter codes rather than
> > 3-letter codes wherever they exist.  This is important advice for
> > interoperability. 3066 also says that you should use ISO codes rather
> > than some arbitrary label where it exists. Etc.
> >
> > I think the original text was defining how one should label languages in
> > CSS, not just how the matching should work.  And I think it is important
> > to retain the former, though the text could certainly be reworded so as
> > to separate the two ideas, remove the HTML reference and refer to
> > RFC3066.
> 
> If I understand Richard correctly, he is suggesting that the CSS
> ':lang()' selector is treated semantically rather then syntactically.
> In other words, ':lang(en)' means "English," not "a string starting
> with 'en'". That's interesting, but I think it will be too complex.
> Consider this XML-based language, that allows text either in French
> (0) or English (1):
> 
>     <MYLITTLELANGUAGE>
>       <WORD LANG="0">arbre</WORD>
>       <WORD LANG="1">tree</WORD>
>     </MYLITTLELANGUAGE>
> 
> Then this style rule would turn the word "tree" green:
> 
>     WORD:lang(en) { color: green }
> 
> Wouldn't it be better to simply *recommend* that developers use codes
> as per RFC 3066, even if they only need two languages?
> 
> How about the text I proposed earlier, but with an additional note
> (i.e., not normative):
> 
>     The pseudo-class ':lang(C)' matches if the element is in language
>     C. CSS doesn't define what are valid language names and the string
>     C doesn't have to be a valid language name in the source document.
>     It is matched the same way as for the '|=' operator.
> 
>     Note: It is recommended, however, that documents and protocols
>     indicate language using codes from RFC 3066 [RFC3066] or its
>     successor, and by means of "xml:lang" attributes in the case of
>     XML-based documents [XML]. See "FAQ: Two-letter or three-letter
>     language codes."[1]
> 
>     [1] http://www.w3.org/International/questions/qa-lang-2or3.html
> 
> (replaces the 2nd para in http://www.w3.org/TR/CSS21/selector.html#lang)
> 
> Bert
> --
>   Bert Bos                                ( W 3 C ) http://www.w3.org/
>   http://www.w3.org/people/bos/                              W3C/ERCIM
>   bert@w3.org                             2004 Rt des Lucioles / BP 93
>   +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Friday, 17 October 2003 06:51:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 27 April 2009 13:54:24 GMT