Re: Text selector [was Re: breaking overflow]

From: James Hopkins <james@idreamincode.co.uk>
Date: Mon, 4 Jan 2010 12:15:36 +0000
Cc: Brad Kemper <brad.kemper@gmail.com>, www-style list <www-style@w3.org>
Message-Id: <32265CCA-C32B-4E64-A0A2-60547ADE4635@idreamincode.co.uk>
To: Boris Zbarsky <bzbarsky@MIT.EDU>
# I personally can't envisage a use case where crossing
# textnodes (or element boundaries, for that matter) in
# order to match a single word, would be beneficial.

Looks as though I was being far too narrow-minded when making such a  
sweeping statement :)

Having said that, it's easy to see how earlier discussions on such a  
text selector (if indeed there were any), would have been blighted by  
treating textnode/element boundaries differently in varying scenarios,  
and I'm slightly unsure as to why we're considering doing the same  
here. I'm of the opinion that we should consider adopting a 'one size  
fits all' logic that treats all textnodes in the same way, independent  
of their context - whether this means having the ability to cross the  
boundaries of textnodes or not.

My thoughts on the following scenarios:-

> Simple example #1:
> <!DOCTYPE html>
> <body>
> <script>
>  document.body.appendChild(document.createTextNode("ba"));
>  document.body.appendChild(document.createTextNode("r"));
> </script>

The authors intention was to create two separate textnodes, thus they  
should be treated separately.

> Simple example #2:
> <!DOCTYPE html>
> <body>
>  ba<!--comment, so the textnodes aren't even quite
>  adjacent in the DOM-->r

An HTML comment should be treated similarly to an element node, in  
that it acts as delimiter which splits the surrounding text into two  

> Simple example #3:
> <!DOCTYPE html>
> <body>
>  ba<script></script>r
> (again, not even adjacent in the DOM; user-perceived as one word.

See above.

> Simple example #4, equivalent to the above:
> <!DOCTYPE html>
> <body>
>  ba<script>document.write("r")</script>

See the conclusion near the end of this email for details.

> Example #5, that might depend on the exact parser algorithm used and  
> might not ever lead to multiple textnodes in an HTML5 parser but I  
> think does in some cases in existing parsers:
> <!DOCTYPE html>
> <body>bar
> with an HTTP packet boundary between the 'a' and the 'r'.

I'm unsure as to what this is :)

> Example #6, which depends on exact behavior still being hammered out  
> in the HTML5 spec:
> <!DOCTYPE html>
> <body><script>
>  document.write("ba"); document.write("r");
> </script>

They're written as two separate entities, so should be treated as such.

> Example #7: editable content (designMode/contentEditable) can  
> probably lead to random textnode boundaries as text is inserted,  
> then removed, then edited, wrapped in tags, unwrapped from the tags,  
> etc.  I don't think there's anything that specifies what the  
> resulting DOM should look like on the individual textnode level yet.

In conclusion, it makes sense to me that all adjacent textnodes are  
treated as separate entities; that is ::text() is excluded from  
crossing adjacent textnode boundaries in order to match a single word.  
In contrast, the selector _should_ be able to cross one side of an  
element's boundary (start-tag or end-tag), so as to match an element's  
sibling textnode with any descendant textnodes (with no whitespace  
inbetween) of that element which appear in the document source (which  
successfully excludes example #4).

As an aside, I believe it would also be beneficial to have the ability  
to include multiple strings in the same selector, something  
like ::text("foo bar", "test text").
