Re: Suggestion for Attribute Selectors (and answer to Q[lang] problem) from Ian Hickson on 1998-03-07 (www-style@w3.org from March 1998)

From: Ian Hickson <exxieh@bath.ac.uk>
Date: Sat, 7 Mar 1998 23:05:29 -0000
To: Bert Bos <Bert.Bos@sophia.inria.fr>
Cc: www-style@w3.org
Message-ID: <004901bd4a1d$a7f37820$c820268a@hpxu>
>On Sat, 7 Mar 1998, I wrote:
>
>> We currently have
>>      [attr="fullvalue"]
>>      [attr~="wordinlist"]
>> in the CSS2 Working Draft. What about
>>      [attr?="partvalue"]
>> for a case (in)sensitive search of an attribute? [SNIP]


Bert Bos soon replied:
>We've certainly thought about this, but we decided that you would soon
>need full regular expressions, and those would complicate both the syntax
>and the implementations rather a lot.

[note - this is *not* a flame response, although it may sound like it!!! I
mean what I'm about to say in the best possible way!!!]

I find it interesting that this is 'too complicated' to be added, while the
language/quotes stuff below is 'simple' enough. The "too complicated" line
was also used with the idea of removing the restrictions on pseudo-elements
in selectors ("not more than one", "only at the end"). However, *tables* are
'straight-forward' enough to be specified?!

Anyway, returning to this particular argument, you answer:

>We've certainly thought about this, but we decided that you would soon
>need full regular expressions, and those would complicate both the syntax
>and the implementations rather a lot.

Regular expressions would be much better, and would prevent the problem of
multiple syntaxes. At the moment we have several ways of matching on
attributes: =, ~=, and |= (see below). It would be much better to only have
*one* syntax, which would be more thorough and, indeed, would be based on
regular expressions as already implemented in many different applications.

>Case-insensitive matching is a real problem once you go beyond ASCII.
The letters "i" and "n" in my "insensitive" were bracketed out for just this
reason. In any case, for the use I want (matching parts of URIs) case
insensitivity is very bad. So, as far as my suggestion goes, I'm all for
*not* having insensitivity. (why? http://www.w3.org/MarkUp/ is not the same
as http://www.w3.org/markup/)

>And in fact, once you start matching attributes, why not match content as
well?
Well, the CSS working group reached that bridge ages before this suggestion,
and decided to go ahead anyway. It is a big leap from matching attributes to
matching content, and (in my opinion) matching content is not very
structure-friendly anyway (the whole point of things like "DFN" and "VAR" in
HTML is that the author *tells* the parser what each bit is). I believe
there is a much stronger argument against content matching than against
attribute matching.

>The selector syntax is already getting complex,
A more comprehensive, self-consistent, and extensive regular expression
system would actually do more for simplicity and clarity than adding more
and more attribute selectors.

>  1. what do we need to match on?
Good question. Depends on the attribute. Since CSS is for use with XML too
though, we shouldn't make *any* assumptions about which attribute does what
(I was suprised that @class was removed from the draft). It makes me worried
that the suggestion given below is specific to one attribute. In my first
XML DTD, I was going to use "language" as my attribute for setting language
context.

>  2. how complex can we make the selectors while keeping them readable?
See above. And yes I know it's easier to say than to implement :-) but then
again, you are doing tables are you not?

>  3. can we come up with intuitive syntax?
Regular expressions have been in use for *years*. They are commonly used in
*many* applications (hmm. I'm repeating myself!). One idea would be to
examine the different ways it is implemented and find the most intuitive
one. It has to be said that so far, the attr selector system hasn't been the
most intuitive...

>  4. what can be implemented without a performance hit?
>It's also our policy not to go to far ahead of implementation experience.
>Most of the new selector features have been implemented in some form or
>another in various programs, and about the rest we have implementers'
>assurance that they can do it.
Regular expressions have been implemented all over the place, and
(sometimes) with great efficiency.

>We want many interoperable implementations of CSS, and that means we have
>to proceed step by step. It also means designers and authors still have to
>work together to insert appropriate CLASS attributes.
Absolutely.

>We have, however, decided to add the operator "|=", specifically for
>matching language codes:
>
>    [lang |= fr]
>
>will match all language codes whose first segment is "fr": "fr", "fr-ca",
>"fr-fr-argot", etc.

Sorry?! This is suprisingly similar to my original suggestion, which you've
just rejected! The only example I gave which could *not* be done this way
was the one which tested the end of the domain name (see my original post
for details).

I went on to say something very patronising...
>> Note - in the page about generated content, there is a comment about "how
do
>> we do language dependant quotes?".
>> The answer is
>>    [lang="fr"] Q:before, Q[lang="fr"]:before { content: '\AB' }
>> which copes with quotes within a language section, and quotes of a
>> particular language in another language's section.
>> (unless I've missed something important!)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...I did, as I should have guessed...

>It's a bit more complex than that:
...yes... [I snipped a huge bit of Bert's answer here]


>To solve [this], we want to add a pseudo-class ":lang(fr)", which matches
if
>the element is in language fr according to the language inheritance rules
>of the source document. Thus
>
>    HTML:lang(en) {quotes: '"' '"'  "'" "'"}


Well, actually, this can solve all of the quote problems mentioned. The only
problem is, to allow it solve the problems, you need to derestrict the
pseudo-element rules....

 Q:lang(en):before { content: '"'}
 Q:lang(en):after { content: '"'}
 Q Q:lang(en):before { content: "'" }
 Q Q:lang(en):after { content: "'" }
 Q:lang(fr):before { content: '\AB'}
 Q:lang(fr):after { content: '\AC'}
 Q Q:lang(fr):before { content: "'"}
 Q Q:lang(fr):after { content: "'"}
 Q Q Q:before { content: "" }
 Q Q Q:after { content: "" }
 Q Q Q { font-style: italic; }

BTW, you only rarely get more than two levels of quoting, if you had any
more the quote would be so long as to no longer warrant being in-line. A
block level quote doesn't have to be quoted if it is indented or italicised
or otherwise put aside from the rest.

--
Ian Hickson
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GIT/M/S d- s+: a--- C++(+++)>$ U P L+ !E W++ N++ o? K? w++>+++ O- !M V- PS+
PE- Y+ PGP t 5+++>++++ X- R+++ tv b++(+++) DI D++ G e-(*)>+++++ h!()(--) y?
------END GEEK CODE BLOCK------
Received on Saturday, 7 March 1998 18:05:57 UTC