Re: Text selector [was Re: breaking overflow] from Tab Atkins Jr. on 2010-01-07 (www-style@w3.org from January 2010)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Thu, 7 Jan 2010 08:00:23 -0600
To: Brad Kemper <brad.kemper@gmail.com>
Cc: "robert@ocallahan.org O'Callahan" <robert@ocallahan.org>, Boris Zbarsky <bzbarsky@mit.edu>, www-style list <www-style@w3.org>
Message-ID: <dd0fbad1001070600g4e11de64i71dd49e102e12c69@mail.gmail.com>
On Thu, Jan 7, 2010 at 1:06 AM, Brad Kemper <brad.kemper@gmail.com> wrote:
> On Jan 6, 2010, at 9:49 PM, Tab Atkins Jr. wrote:
>> Right, so one of the major problems is the misnested boxes that can
>> occur.  We want to allow nested ::text matches, but misnested matches
>> are a problem.  Dealing with them naively results in the unintuitive
>> and undesirable behavior I pointed out before.  How does my suggestion
>> for most-powerful-matched-first sound for fixing this, and making
>> previously matched ::text pseudos count as element boundaries just
>> like a real element when matching later/less powerful ::text pseudos?
>
> I have to read it again in the morning with fresher eyes. My initial reaction to the first part was good, but then I started getting lost, mostly due to my attention span at this time of day. But, how about this for a simple way of saying what I think we both intuit to be write in the example:
>
> Follow normal cascading rules for each matched character, as though you were creating individual pseudo-boxes for each character, but adjacent character boxes that have the same pseudo-boxes because of the same rule get merged together into one box after all the text of the element has been otherwise resolved.
>
> I'll re-read your details again in the morning to see if this made more sense of it or less, or if I am still missing something, but I wanted to throw it out there this way while I was still awake.

I can't tell if that's the same or not, but if confuses me anyway.
Stated hopefully more clearly, my attempted resolution is this: apply
::text rules in cascading order (strongest first).  Previously-applied
::text rules act like normal elements to later ::text rules,
preventing matching across their boundaries.

So in the old example of
<p>ABCDEF</p>
::text("ABCD") { color: red; }
::text("CDEF") { color: blue; }

The second rule would apply first, as it's stronger according to the
cascade.  This would produce the pseudostructure
<p>AB<text>CDEF</text></p>.  Then the first rule would try to apply,
but since that would require matching across an element boundary (the
text pseudoelement), it fails.

As well, matching is defined to happen greedily and in the logical
direction of text.  This means that in the following:

<p>ABABABAB</p>
::text("ABAB") { border: 1px solid black; }

You wind up with exactly two things wrapped in a border, producing the
pseudostructure <p><text>ABAB</text><text>ABAB</text></p>.  The middle
ABAB (characters 3-6) in the text attempted to match, but was blocked
because the first ABAB (characters 1-4) had already matched, creating
an element boundary.  But then the last ABAB (characters 5-8) was free
to match normally.

This allows nested matches.  For example:

<p>ABCDEF</p>
::text("ABCD") { color: red; }
::text("CD") { font-weight: bold; }

Would result in ABCD being red, and CD being bold, with this
pseudostructure: <p><text>AB<text>CD</text></text>EF</p>.

Note, though, that this last one might require a slight change in what
we were saying about "element boundaries".  Right now, the following:

<p>AB<i>CD</i>EF</p>
::text("ABCD") { font-weight: bold; }

will fail to match.  But that's precisely the situation that the
previous example (about nested ::text()s) creates, as the CD matches
first, followed by the ABCD.  We either have to allow things to match
across element boundaries as long as they don't *misnest*, or have to
accept that order is important in some silly ways.

Since I think misnesting was the whole problem with crossing element
boundaries in the first place, I assume this is okay?  Or does it run
into the same performance ratholes that ::contains did?

Now, this rule still produces some possible implementation problems.
For example, take this:
<p>ABCDEF</p>
::text("ABC") { color: red; }
::text("CDF") { color: blue; }

As the CSS engine eats the text, it notes an A and figures the "ABC"
might be matching.  It then sees a B and C.  This would finish the
"ABC" match, but it would also start the "CDF" match, which is more
powerful and should win.  The parser has to wait until it consumes a D
(still matches) and then an E (no match) to realize that "CDF" won't
match here and "ABC" should be able to apply, at which point it can
run back and apply the pseudoelement and then emit the D and E.

I'm assuming a particular kind of processing here, which may not apply
in actual browser engines.  Let me know if I'm way off-base.  If I'm
on the right track, though, is this a relatively large problem?  Is
there another way to arrange matching to make it easier while
retaining the ability to nest but not misnest?  Perhaps actual
first-come-first-serve matching, rather than matching the most
powerful first?  If this isn't really a significant issue, or at least
not one that will be significantly changed with a different matching
algorithm, then we don't need to change anything.

~TJ
Received on Thursday, 7 January 2010 14:00:55 UTC