W3C home > Mailing lists > Public > whatwg@whatwg.org > January 2006

[whatwg] comment parsing

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 23 Jan 2006 02:33:05 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0601230215400.9516@dhalsim.dreamhost.com>
On Sat, 21 Jan 2006, Anne van Kesteren wrote:
>
> Quoting Anne van Kesteren <fora at annevankesteren.nl>:
> > However, from the specification it is not entirely clear what should happen
> > with <!--></p>.
> 
> The specification also does not match what is widely implemented for cases
> like:
> 
> # <p><!-- --FAIL></p>
> 
> Here is how they are parsed more or less (without EOF and error handling):
> 
> zcorpan says:
> ok, so it is parsed like this...
> <! marked section open state
> -- comment open state
> anything except --: stay in comment open state
> -- comment end state
> anything except >: stay in comment end state
> > close comment

In my testing, I found that browsers were less than consistent about this. 
For example, this:

   <!-- a > -- b > c --> EOF

...in Mozilla in quirks mode, is treated as one long comment, but this:

   <!-- a > -- b > c EOF

...is treated as if the comment ended after the "a". Given the security 
concerns raised by reparsing (see my last e-mail), we don't want to do 
this. Safari quirks mode looked like it might be implementing your 
described behaviour. I couldn't test Opera, it raises exceptions on my 
test script when I use it to test unexpected EOF situations.

IE6 (in both standards mode and quirks mode) has this interesting 
behaviour:

   SOURCE                      DOM
   <!-- a > EOF                Empty comment.
   <!-- a > - EOF              Text node "<!-- a > -".
   <!-- a > -- EOF             Text node "<!-- a > --".
   <!-- a > --> EOF            Comment " a > ".
   <!-- a > -- > EOF           Empty comment, text node " -- >".
   <!-- a > -- b > EOF         Empty comment, text node " -- b >".
   <!-- a > -- b > c - EOF     Text node " a > -- b > c -".
   <!-- a > -- b > c -- EOF    Text node " a > -- b > c --".
   <!-- a > -- b > c --> EOF   Comment " a > -- b > c".


Per the HTML5 spec now, it should be:

   SOURCE                      DOM
   <!-- a > EOF                Comment " a >".
   <!-- a > - EOF              Comment " a > -".
   <!-- a > -- EOF             Comment " a > --".
   <!-- a > --> EOF            Comment " a > ".
   <!-- a > -- > EOF           Comment " a > -- >".
   <!-- a > -- b > EOF         Comment " a > -- b >".
   <!-- a > -- b > c - EOF     Comment " a > -- b > c -".
   <!-- a > -- b > c -- EOF    Comment " a > -- b > c --".
   <!-- a > -- b > c --> EOF   Comment " a > -- b > c ".

This seems like the most logical lowest-common-denominator way of 
describing this.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Sunday, 22 January 2006 18:33:05 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:58:44 UTC