[whatwg] Parsing: comment tokenization from Nicholas Shanks on 2007-04-07 (public-whatwg-archive@w3.org from April 2007)

From: Nicholas Shanks <contact@nickshanks.com>
Date: Sat, 7 Apr 2007 13:27:14 +0100
Message-ID: <F95CF215-7FDC-41BA-B215-E20C1F0E7898@nickshanks.com>

On 7 Apr 2007, at 02:56, Anne van Kesteren wrote:

> The tokenization section should also handle:
>
>   <!-->
>   <!--->
>
> as "correct" comments for compat with the web. This means that
>
>   <!-->-->
>
> shows "-->" and that
>
>   <!--->-->
>
> shows "-->".

Why on earth is this a good idea?
AFAIK browsers and other HTML clients don't currently treat these as  
comments, and compelling them to do so will cause several problems:

1) Web developers currently expect things like  to result  
in the comment "greater than five?". Changing such expectations on a  
whim is harmful.

2) A double HYPHEN-MINUS delimits comments within tags, this provides  
compatibility with XML and SGML and changing this needlessly in HTML5  
will just complicate conversion.

3) You claim "compat with the web" but don't provide any evidence to  
support that. Are there huge numbers of sites expecting <!--> to  
represent a comment without content? Can such sites not be fixed  
instead of polluting HTML with additional rules? I'd rather have a  
handful of broken sites that their authors will fix than saying to  
the other 99% of authors "hey, you can now do this" and ending up  
with millions of broken sites. (I say broken, because they will not  
be backwards compatible with current or previous UAs)

- Nicholas.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2157 bytes
Desc: not available
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20070407/225af0bc/attachment.bin>

Received on Saturday, 7 April 2007 05:27:14 UTC