- From: Bert Bos <bert@w3.org>
- Date: Fri, 17 Oct 2003 11:40:35 +0200
- To: <www-style@w3.org>, "'W3c I18n Group'" <w3c-i18n-ig@w3.org>
Richard Ishida writes:
> Bert Bos writes:
> > Tex Texin writes:
> > > For the purposes of matching, I wonder if it makes sense to
> > reference
> > > the RFCs at all. Isn't it really string matching based on strings
> > > formatted with hyphen separators? Does any software verify that the
> > > language tag contains appropriately registered codes or uses ISO
> > > codes? Should it be an error, or perhaps the rule ignored, if a CSS
> > > document specifies :lang(k9) since k9 is not an offical
> > language code
> > > or a properly formatted private code.
> >
> > I like that suggestion: it removes a dependency.
> >
> > The definition of the "|=" operator is already generic. It
> > only requires a UA to split a string value at every "-" and
> > doesn't require the string to be a valid language. The
> > ':lang()' refers to that definition and could be made generic
> > as well, e.g.:
> >
> > Current text in 5.11.4:
> >
> > The pseudo-class ':lang(C)' matches if the element is in language
> > C. Here C is a language code as specified in HTML 4.0 [HTML40] and
> > RFC 1766 [RFC1766]. It is matched the same way as for the '|='
> > operator.
> >
> > Proposed:
> >
> > The pseudo-class ':lang(C)' matches if the element is in language
> > C. CSS doesn't define what are valid language names and the string
> > C doesn't have to be a valid language name in the source document.
> > It is matched the same way as for the '|=' operator.
>
>
> I disagree with this proposed para. I think you are throwing out the
> baby with the bath water.
>
> I see the value of referring to RFC3066 is to ensure maximum
> standardisation/interoperability in the way language codes are used.
> For example, 3066 requires the use of 2-letter codes rather than
> 3-letter codes wherever they exist. This is important advice for
> interoperability. 3066 also says that you should use ISO codes rather
> than some arbitrary label where it exists. Etc.
>
> I think the original text was defining how one should label languages in
> CSS, not just how the matching should work. And I think it is important
> to retain the former, though the text could certainly be reworded so as
> to separate the two ideas, remove the HTML reference and refer to
> RFC3066.
If I understand Richard correctly, he is suggesting that the CSS
':lang()' selector is treated semantically rather then syntactically.
In other words, ':lang(en)' means "English," not "a string starting
with 'en'". That's interesting, but I think it will be too complex.
Consider this XML-based language, that allows text either in French
(0) or English (1):
<MYLITTLELANGUAGE>
<WORD LANG="0">arbre</WORD>
<WORD LANG="1">tree</WORD>
</MYLITTLELANGUAGE>
Then this style rule would turn the word "tree" green:
WORD:lang(en) { color: green }
Wouldn't it be better to simply *recommend* that developers use codes
as per RFC 3066, even if they only need two languages?
How about the text I proposed earlier, but with an additional note
(i.e., not normative):
The pseudo-class ':lang(C)' matches if the element is in language
C. CSS doesn't define what are valid language names and the string
C doesn't have to be a valid language name in the source document.
It is matched the same way as for the '|=' operator.
Note: It is recommended, however, that documents and protocols
indicate language using codes from RFC 3066 [RFC3066] or its
successor, and by means of "xml:lang" attributes in the case of
XML-based documents [XML]. See "FAQ: Two-letter or three-letter
language codes."[1]
[1] http://www.w3.org/International/questions/qa-lang-2or3.html
(replaces the 2nd para in http://www.w3.org/TR/CSS21/selector.html#lang)
Bert
--
Bert Bos ( W 3 C ) http://www.w3.org/
http://www.w3.org/people/bos/ W3C/ERCIM
bert@w3.org 2004 Rt des Lucioles / BP 93
+33 (0)4 92 38 76 92 06902 Sophia Antipolis Cedex, France
Received on Friday, 17 October 2003 05:40:37 UTC