- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 13 Jun 2007 22:34:27 +0000 (UTC)
On Wed, 12 Jul 2006, Stewart Brodie wrote: > > > > > > In the main phase, section 'If the insertion mode is "in row"', the > > > last option for 'anything else' says "process ... as if ... in > > > table". I think that should say "as if ... in table body" instead. > > > That case will re-throw the token out to "in table" in any case if > > > it doesn't handle it. > > > > There'd be no difference. Any token that isn't handled by the "row" > > mode will not be handled by the "table body" mode. > > Yes, I noticed that and agree, except that it just seemed to me that it > would be more natural to expect unhandled things to be thrown to the > next level of scope (table body) rather than bypassing it and going > directly to the table. I don't really see why. Seems like an additional level of indirection. > > > I've come to the conclusion that you need pictures to accompany the > > > "adoption agency algorithm". However, I'm not an artist. Indeed, > > > I'm so bad at drawing pictures, that in the past, users often sent > > > me replacement bitmap graphics for my programs because they found my > > > attempts so distressing :-) > > > > Yeah, I completely agree. Diagrams and examples. If someone wants to > > do a diagram here I'd be most happy. Failing that, I'll probably get > > around to it in due course (e.g. once I'm convinced it actually > > works). > > It is the most complex part of the tree construction. Perhaps in lieu > of pictures in the short term, a short non-normative summary could be > added describing what the algorithm is doing, because reverse > engineering it from the 14-step plan is hard. As I said, if someone wants to contribute examples, introduction materials, diagrams, or other helping material, I'm certainly open to adding it to the spec. I just don't want to do it myself until I'm confident the spec is stable. > > > The "parsing quirks" box lists several issues that I think are > > > important. The <script> one in particular is so very common. > > > Unfortunately, I had to cave in eventually and support that because > > > it broke some customers' own sites. > > > > Can you describe what exactly the quirk is? I have yet to see an > > algorithmic description of how to parse <script> blocks in quirks > > mode. In my research and the research that other people have done, it > > was found that every UA does it slightly differently. This is why I'd > > really rather not do this. If you can tell me exactly what it is, I > > might be more convinced to do it. > > Yes, it's hard to pin down. In effect, it's a new value for the content > model flag which is like some sort of combination of RCDATA and > PLAINTEXT. I'm not sure it's just a quirk, to be honest. I've tried the > following snippet in Firefox, Opera & IE6 and they behave the same way > regardless of the presence of a strict HTML4 doctype declaration before > the <html> > > <html><title>The <!-- comment with a </title> in the --> title</title><body > onload="document.body.appendChild(document.createTextNode(document.title))"> > > In all cases, the window title and the text shown in the document body was: > > The <!-- comment with a </title> in the --> title > > The same behaviour appears to apply to TEXTAREA, SCRIPT, NOSCRIPT, > NOFRAMES, NOEMBED. STYLE works differently in Firefox (it thinks that > the content property's value terminates the style tag: > > <style> <!-- h1:after { content: '</style>'; color: red } --> </style> > > The rule seems to be that whilst you are lexing the contents of one of > these magical elements, you have an additional flag, initialised to > false, that indicates that you are inside an pseudo-comment. You > continue to accumulate character tokens, but if you see the sequence > <!-- and the flag is false, you set the flag to true. If the flag is > true and you see the sequence -->, you set the flag to false. Whilst > the flag is true, finding the < does not switch to the open tag state. > The character tokens are all accumulated into the content of the > element, regardless of whether they match the <!-- or --> markers. It does indeed seem that CDATA and RCDATA have this behaviour in the tokeniser in IE. Fixed. > > > Finally (for now ;-), right at the beginning of the tree > > > construction section, it says that DOM Mutation events must not fire > > > for changes caused by the UA parsing the document. I cannot decide > > > whether or not I agree with that statement. My experimentation > > > appears to show that this is indeed what happens in Firefox, at > > > least. I put a script in the head of my document that attaches a > > > listener for DOMNodeInserted on the document.documentElement node > > > (i.e. the HTML element) and it never gets called due to nodes being > > > added by the parser. Internally, for me, it's a PITA though, > > > because my node tree construction code and DOM implementation code > > > use the same internal APIs - and these automatically trigger the DOM > > > events, which, in turn, get dispatched to the various internal > > > default event handlers to deal with the special types of node that > > > require additional behaviour (like IMG, LINK, META etc.). > > > > In Web browsers it's simply not an option. Having to fire mutation > > events for every mutation according to the complete DOM3 Events model > > is prohibitively expensive. > > To be honest, I've not found it a burden even on the sorts of low-end > devices that our software runs (typically 300MHz CPUs, 8MB RAM, that > sort of thing) Then again, I have a highly optimised event dispatcher > that takes steps to minimise the work, particularly when there are no > DOM listeners for the event being raised, which will almost always be > the case for the events concerned (DOMNodeInserted and > DOMNodeInsertedIntoDocument and the Removed counterparts). The internal > default event handlers have similar filtering to eliminate any > unnecessary processing quickly. Even minimal work is more than no work, and when you're dealing with thousands of elements, that's a big difference (in the order of milliseconds). > In the "in body" section, WBR doesn't really belong with a,b,big,em... > because it never had content. It probably ought to go in with > area,basefont,bgsound... a bit further down, or in its own section. > There's no real point bothering with putting it in the list of active > formatting elements so it's coming off the stack again straight away. Fixed. Thanks, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 13 June 2007 15:34:27 UTC