W3C home > Mailing lists > Public > public-html@w3.org > August 2009

Re: Issues arising from not reparsing

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 13 Aug 2009 10:26:39 +0300
Cc: HTMLWG WG <public-html@w3.org>
Message-Id: <6A443357-B3C1-409D-A8DE-11623D3848BF@iki.fi>
To: Ian Hickson <ian@hixie.ch>
On Aug 12, 2009, at 22:55, Ian Hickson wrote:

> On Wed, 12 Aug 2009, Henri Sivonen wrote:
>> On Aug 12, 2009, at 12:10, Henri Sivonen wrote:
>>
>>> I think I'll create a wiki page with requirements and a proposed  
>>> delta
>>> spec first, though, because others on #whatwg were interested in
>>> pondering alternative solutions given a set of requirements.
>>
>> Wiki page created: http://wiki.whatwg.org/wiki/CDATA_Escapes
>
> Wow. Please can we stick to just the current magic escapes and not add
> even more magic?

The current magic without all the magic that current browsers  
implement lead to some incompatibilities with existing content. I  
don't know how often a user would hit these issues, but when the  
problems do occur, they wreck the whole page. Therefore, I think we  
should seriously try to improve the magic so that it substitutes the  
current browser magic better in practice while still not doing  
reparsing.

Here are points that need research, in my opinion:

  1) Would removing the escape flag from xmp, title and textarea  
improve or degrade Web compat given no reparsing? To research this, I  
suggest parsing a substantial body of Web content with the current  
parsing algorithm and then grepping the text content of every xmp  
element for |<!--.*</xmp| (ignoring case and letting . match over line  
breaks). (Likewise for textarea and title, except rejecting hits where  
any part of "<!--" or "</title" has been entity-escaped.) Basically,  
if there are almost no hits, it would be safer to zap the escape flag  
from these elements, because accidentally having <!-- eat up the rest  
of the page is worse than terminating one of these element prematurely  
very rarely.

  2) Would making comments and escape runs close on --\s+!> improve or  
degrade Web compat given no reparsing? To research this, I suggest  
grepping |--\s+!>| a substantial body of Web content and analyzing the  
hits.

  3) Would making --!> and --\s+> close escapes improve or degrade Web  
compat given no reparsing? To research this, I suggest parsing a  
substantial body of Web content with the current parsing algorithm and  
then grepping the text content of every script and style element for  
|--!>| and |--\s+>| and analyzing the hits.

  4) Would making <!-- not open an espace when there's non-whitespace  
on the line before it improve or degrade Web compat given no  
reparsing? To research this, I suggest parsing a substantial body of  
Web content with the current parsing algorithm and then grepping the  
text content of every script and style element for |^.*\S.*<!--| and  
analyzing the hits.

Hixie, have you already run these analyses? If not, it would be  
awesome if someone who already maintains the capability to run these  
searches could run them. (I volunteer to perform the "analyze the  
hits" parts, but I don't currently have the readiness to run the  
searches.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 13 August 2009 07:27:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:43 GMT