Re: <iframe doc="">

Lachlan Hunt wrote:
> Ian Hickson wrote:
>> Markup in attributes has it's disadvantages, but it's not necessarily a
>> problem.
> 
> One big disadvantage with putting markup in attributes, especially for 
> the doc proposal, is that ampersands will often have to be double 
> escaped as &amp;amp;, due to the content of doc effectively being parsed 
> twice - once as the content of the attribute, and then again to parse 
> the string as a document.

Why "especially for the doc proposal"? The ampersand problem seems the 
same for any markup-in-attribute proposal, and doc has far fewer 
escaping problems than the data: alternative.

Presumably almost nobody is ever going to write the markup by hand, 
since the point is to embed untrusted content in a sandbox, and if 
you're embedding it by hand you can verify the content visually and 
don't need to sandbox it. So the important thing is how server-side code 
will do the escaping.

If you have a (Perl) script which does something like

   print "<iframe sandbox doc=\"$doc\">";

you'll have to escape with something like

   s/"/&quot;/g;

in order to avoid security vulnerabilities, and also with

   s/&/&amp;/g;

in order to get correct processing. If you instead had

   print "<iframe sandbox src=\"data:text/html;charset=utf-8,$doc\">";

you'd still just have to escape " for safety; but for correct processing 
in current browsers you'd have to at least escape & and do

   s/%/%25/g;
   s/#/%23/g;

(are there any others you need?) and for validity I think you'd have to 
instead do

   s/([^;\/?:@&=+$,a-zA-Z0-9-_.!~*'()])/join "", map { sprintf "%%%02x", 
$_ } unpack "C*", encode("utf-8", $1)/eg;

(if I interpret RFC2397's reference to RFC2396's "urlchar" as actually 
meaning "uric", and if I haven't made stupid mistakes).

Your server-side script probably already has access to an HTML escape 
function that will do what's needed for <iframe doc>, and if you have a 
decent templating system it'll do it automatically. It's no different to 
any other form of embedding content from the user, so it doesn't seem an 
unreasonable burden. (Escaping data: correctly is a lot more complex and 
a lot less likely to be provided as a function in your server environment.)

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Sunday, 17 January 2010 11:57:10 UTC