Re: [selectors] Improving the definition of language range in :lang()

On Mon, Dec 8, 2014 at 4:43 PM, Benjamin Poulain <benjamin@webkit.org> wrote:
> Dhi is making progress toward the implementation of :lang() and it becomes
> clear that the definition of language range in the CSS spec makes things
> complicated in practice.
>
> There was already the problem of using asterisk in the range like previously
> mentioned (http://lists.w3.org/Archives/Public/www-style/2014Dec/0046.html).
>
> A new problem we are running into is the filtering with subtags composed of
> numbers. For example, the language tag "de-CH-1996" cannot be filtered by
> the valid language range "*-1996" because the definition in the spec defines
> the range as an identifier preceded by an asterisk, and -1996 is not a valid
> CSS identifier.
>
> I believe it might be better to use the definition from RFC 4647 directly
> instead of creating a new definition for CSS. This would be simpler than
> trying to retrofit CSS identifiers into language ranges.
>
> We will continue the implementation with the current CSS definition but
> feedback is welcome.

We can't use the RFC4647 definition directly; CSS grammars operate on
CSS tokens, not on characters.

As it stands, you can still write *-1996 in :lang() by escaping the
dash, like `:lang(*\-1996)` - that turns it into an asterisk followed
by an identifier.  This, of course, isn't great.  I think we assumed
that language tags weren't ever composed of just numbers.  We should
probably allow a string in :lang() as well, for when tokenization
doesn't work well for the given language tag.

(Using strings is our general policy for handling values that come
from outside CSS, precisely because we can't predict what form they'll
take and how they'll play with CSS tokenization, as in this case.  The
fact that we didn't write this with a string originally is what makes
me think we simply didn't believe such numeric tags would exist.)

~TJ

Received on Wednesday, 10 December 2014 16:43:39 UTC