- From: Simon Pieters <zcorpan@hotmail.com>
- Date: Mon, 17 Jul 2006 14:03:20 +0000
Hi, From: Ian Hickson <ian@hixie.ch> >On Sun, 18 Jun 2006, Simon Pieters wrote: > > > > The spec asks whether quirks mode parsing should be adopted[1]. I think > > it would be good if parsing worked more or less the same in quirks and > > standards mode. If we want to adopt quirks mode parsing, then here are > > some remarks: > > > > > Comment parsing is different. > > > > I think the current parsing algorithm for comments should remain. I > > don't think we should adopt IE's "overlapping" comments (<!--> being one > > comment), because that isn't logical and isn't how they work in XML and > > comments in other languages (such as /*/ in CSS isn't one comment). > >I agree. However, in quirks mode this is a requirement. So if we make the >parsing quirks-compatible (as in, if we remove DOCTYPE-switching for >parsing), we have no choice. Ok. I could live with that. > > > The following is considered one script block (!): > > > > > > <script><!-- document.write('</script>'); --></script> > > > > This one is common, I think, and applies to IE6, Safari and Opera even > > in Standards Mode. Script parsing seems to work like this in Mozilla in > > Quirks Mode: > > > > 1. If the parser hits the string "<!--" then set a flag to ignore ></script> > > tags. > > 2. If the parser then hits the string "-->" then reset the flag. > > 3. The flag can only be set once. > > 4. If the parser hits EOF, then reset the flag (if it is set) and >reparse the > > script. > > > > Opera seems to do the same as Mozilla. > >Anything that depends on EOF is a bad idea for security reasons, so I >would be reluctant to do that... > > > We would have to drop reparsing though. > >...which you seem to agree with. :-) > > > > I've tried to figure out exactly what IE does, but I have failed. It > > seems to do reparsing sometimes, and others not, and --> after the > > </script> tag makes a difference, and also whether there are characters > > after the --> (before EOF). The flag can also be set more than once. > > > > Safari seems to do pretty much what IE does. > >Can't spec what I can't describe! :-) If we ignore reparsing, I think I know what Opera, Firefox, IE and Safari do. See these test cases: http://simon.html5.org/test/html/parsing/pseudo-comments/ How to interpret results: If there's nothing outside the tested element, then the parser allows multiple pseudo-comments. If "a-->" is outside the element in question, then the parser doesn't allow any pseudo-comments; for "b-->" the parser allows one pseudo-comment. Below are the results: opera standards mode quirks mode title textarea style script noscript noembed (with plugins enabled) noframes one pseudo-comment firefox standards mode title textarea multiple pseudo-comments style script noscript noembed noframes no pseudo-comments quirks mode title textarea multiple pseudo-comments style noscript noembed noframes no pseudo-comments script one pseudo-comment ie standards mode quirks mode title textarea script noscript noembed noframes multiple pseudo-comments style one pseudo-comment safari standards mode quirks mode title textarea no pseudo-comments style script noscript noembed noframes multiple pseudo-comments I'm not sure what's most sensible to do. I think this is needed for at least <script> parsing. My proposal is to allow multiple pseudo-comments for all RCDATA and CDATA elements. As for an algorithm for how to do that, I think that an extra flag would be sufficient. If the parser hits <!-- while in RCDATA or CDATA, the flag is set to true. Then, if the parser hits --> the flag sets to false. Initially the flag is false. While the flag is true the element can't be closed. What's also interesting is that Firefox and IE don't replace entities inside pseudo-comments for RCDATA elements (title and textarea), but Opera and Safari do: http://simon.html5.org/test/html/parsing/pseudo-comments/rcdata/ Results: firefox ie standards mode quirks mode title textarea entities are not replaced opera safari standards mode quirks mode title textarea entities are replaced I guess we could follow IE on this one. > > > p can contain table > > > > I think this might be a good thing. I would also like p to be able to >contain > > other struct-inline elements, but perhaps that isn't really possible. > >Indeed. It might be desirable also that a valid HTML4 document gets a conforming HTML4 DOM. If it is, then <p>s shouldn't contain <table>. > > > Safari and IE have special parsing rules for <% ... %> (even in > > > standards mode, though clearly this should be quirks-only). > > > > This wouldn't be a bogus comment, as bogus comments end with > (while > > these end with %>), but I think it would be possible to add this if we > > want to be more compatible with IE. > >Oh we could add anything to be compatible with IE... the questions are do >we want to be, and do we need to be. True. >Like you, I don't know. :-) I want to do some research on this in due >course, but I haven't been able to do it yet. Would be interesting to see such a research. :-) Regards, Simon Pieters
Received on Monday, 17 July 2006 07:03:20 UTC