W3C home > Mailing lists > Public > public-html@w3.org > June 2009

Re: unescaped ampersands was: several messages

From: Kornel Lesinski <kornel@geekhood.net>
Date: Mon, 01 Jun 2009 21:33:44 +0100
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-ID: <op.uuu4qjt0ptj49s@aimac.local>
On Mon, 01 Jun 2009 20:39:56 +0100, Julian Reschke <julian.reschke@gmx.de>  

> As far as I can tell, it increases the complexity of recipients that  
> choose only to support conforming documents.

Increase in complexity is minimal.

If you only support conforming documents, then you can simplify it to: "&"  
followed by something that's not entity should be treated as text. To do  
this you need to buffer at most 32 alphanumeric characters (that's length  
of longest entity name), so this shouldn't burden even streaming parsers  
with hard memory constraints.

Even if you happen to have HTML5-before-that-change parser that only  
supports conforming documents, you can "fix" them with:

perl -pe 's/&(?=[a-zA-Z0-9]+=)/&amp;/g'

regards, Kornel Lesinski
Received on Monday, 1 June 2009 20:34:26 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:44:48 UTC