Re: Foster-parenting and taint from Henri Sivonen on 2009-06-11 (public-html@w3.org from June 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 11 Jun 2009 14:29:38 +0300
To: Ian Hickson <ian@hixie.ch>
Cc: HTMLWG WG <public-html@w3.org>
Message-Id: <A263AC9D-7962-4DDD-9DC4-226B624E9249@iki.fi>

On Apr 28, 2009, at 21:43, Ian Hickson wrote:

> On Tue, 28 Apr 2009, Henri Sivonen wrote:
>> On Apr 25, 2009, at 21:15, Ian Hickson wrote:
>>> The spec is compatible with IE (and Opera) on this test:
>>>
>>>  http://software.hixie.ch/utilities/js/live-dom-viewer/saved/92
>>>
>>> WebKit's behaviour (as you describe above) is not compatible with  
>>> this.
>>>
>>> (Ironically, Gecko's behaviour isn't compatible with IE. Not sure  
>>> what's
>>> going on there.)
>>
>> Is there evidence that the Web needs the IE/Opera behavior, which  
>> my be
>> an unintentional artifact of achievin the foster parenting effect  
>> in the
>> CSS formatter rather than the HTML parser? On the face of it, it  
>> seems
>> undesirable to complicate parsing in order to cater for space-only
>> document.writes in tables.
>
> Compatibility with IE is a goal here. While in the example above the  
> write
> is contrived, it's easy to imagine cases where the document.write is
> trying to write something else (e.g. an element node) and happens to
> include the space as well, separate from the text nodes.

Well, a page could conceivable say <table><b>foo</b> <i>bar</i></ 
table>, but it seems like a bad idea to bake this complication into  
the spec if there isn't evidence that it's actually needed for  
existing content to such a degree that users would react badly to not  
having taint.

Considering that WebKit has gotten away for this long with not having  
taint, I have to doubt the necessity of taint for Web compat.

> I'm also really not convinced that this complicates parsing. You can  
> still
> buffer, either straight into a text node or into a separate buffer.  
> If you
> make your tokeniser distinguish between space characters and non-space
> characters when it matters, you can easily handle that difference  
> without
> an additional check in the tree constructor.

Making the tokenizer report non-space and space to the tree builder  
differently limits the design of a parser in other ways. Then you  
can't coalesce runs of characters which means the per-character  
reporting needs to inline into something really fast.

> An additional bit per table checked only when inside tables is not a  
> big deal either.

It's not just an additional bit for tables. It's a *word-aligned* one  
bit of *mutable* bloat for *all* stack nodes. Having mutable data on  
the stack nodes constrains sharing of stack nodes across threads when  
the stack is cloned for speculative parsing for example. (One might  
argue that sharing across threads costs more than it saves, but my  
point is that the single mutable bit constrains the possible designs.)

> Generally speaking I think it's very weird to have text nodes act
> differently based on where they are split by comments or elements.

I agree, but we aren't dealing with the kind of stuff here that  
authors are supposed to write. Foster parenting is very weird to begin  
with. We are dealing with a hack that applies to legacy content.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 11 June 2009 11:30:18 UTC