Re: The attribute selectors [x|=a] and [x~=a]

Ian Hickson wrote:

> Jonathan Lang wrote:

> >However, it was felt that "==" was too geeky, so they were changed to
> >"=" for equality (fair enough I think you'll agree),
> Makes sense.

Yes. Most folks have no problem with "=" meaning equals, and testing for
literal equality is what you want to do most of the time with most
attributes. Not all, though.

> >and "~=" for space separated equality ("sort of equal").
> This is where it starts to get dodgy.

Not really. SGML (and hence XML) has this notion of a space separated
list. 
HTML uses this for, for example, the class attribute. So given this
HTML:

<p class="foo bar baz">stuff</p>

this selector will match:

.bar {style stuff }

Documents written in XML are unlikely to have a class attribute but they
are likely to have some attributes which are space separated lists. So,
we need a selector for this.

> But it's the next bit which scares me:
> >Later there was a requirement to be able to match on language codes
> >for the LANG attribute, which have the form "en" "en-uk" "en-us"
> >"en-uk-cockney", and you want to be able to match on all LANGs that
> >start "en", or "en-uk". So "|=" was introduced for this. Note that
> >"|=" has an extra semantic property that the other two don't, namely
> >that it matches case insensitively, so they aren't really a
> >cooperating set of operators.

Why is that scary? One of the things that CSS2 aims to do is have better
internationalization than CSS1. Sometimes, you need to have a
language-specific selector in the stylesheet. Now, we didn't invent the
language codes, they are defined in some RFC. But HTML uses them and XML
also uses them. We want people to use language codes and we want using
them to be simple.

The syntax of language codes is that they are a hyphen-separated
hierarchical set of tokens. If the first token is two letters, then it
is an ISO language code (it can be one letter, ot three or more ..). If
the second token is two letters, it is an ISO country code.

We don't want literal matching in this case. It puts too much burden on
the stylesheet writer. They remember to add a rule for French (fr) but
forget about Canadian French fr-CA so their rule doesn't always work. Or
maybe they remember Canadian Franch but forget Martinique and Vietnam
and etc etc. Or someone tries to be right on and explicitly lists en-GB
and en-us but hey, they forgot about en-nz and en-au and en-za and
en-GB-manchester-east-1970s. And they dont want a match on ent-qw.

The "scary" selector deals with the easy cases where you want all of a
particular language to match - you just give the first token of the
language. Or more, if you want. No false positives, no false negatives.

I don't think users will find this scary. They just copy examples out of
the spec.


> I still think there is a deep WRONGness about this almost adhoc design. It
> doesn't "feel" right. 

It would be ad-hoc if we had made up the unordered space-separated list
attribute and the hierarchical hyphen-separated atribute ourselves, and
if they were just as likely or unlikely as any other separator. They
aren't, and we didn't. These things already existed.  We just invented
short, declarative ways to get at these pre-existing things.


> If you want it simplified, simply state that in CSS2 the only possible
> combinations are
>  =           equivalent to current ~=
>  =(\20,0,0)= equivalent to current ~=
>  =(,1)=      equivalent to current =
>  =(-,0,1)=   equivalent to current |=

You still really want regexps, don't you ;-)

--
Chris

Received on Friday, 24 April 1998 15:46:20 UTC