Re: unescaped ampersands was: several messages from Kornel Lesinski on 2009-06-01 (public-html@w3.org from June 2009)

From: Kornel Lesinski <kornel@geekhood.net>
Date: Mon, 01 Jun 2009 21:33:44 +0100
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-ID: <op.uuu4qjt0ptj49s@aimac.local>

On Mon, 01 Jun 2009 20:39:56 +0100, Julian Reschke <julian.reschke@gmx.de>  
wrote:

> As far as I can tell, it increases the complexity of recipients that  
> choose only to support conforming documents.

Increase in complexity is minimal.

If you only support conforming documents, then you can simplify it to: "&"  
followed by something that's not entity should be treated as text. To do  
this you need to buffer at most 32 alphanumeric characters (that's length  
of longest entity name), so this shouldn't burden even streaming parsers  
with hard memory constraints.

Even if you happen to have HTML5-before-that-change parser that only  
supports conforming documents, you can "fix" them with:

perl -pe 's/&(?=[a-zA-Z0-9]+=)/&amp;/g'

-- 
regards, Kornel Lesinski

Received on Monday, 1 June 2009 20:34:26 UTC