- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Sun, 17 Jan 2010 11:56:36 +0000
- To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- CC: Ian Hickson <ian@hixie.ch>, public-html@w3.org
Lachlan Hunt wrote: > Ian Hickson wrote: >> Markup in attributes has it's disadvantages, but it's not necessarily a >> problem. > > One big disadvantage with putting markup in attributes, especially for > the doc proposal, is that ampersands will often have to be double > escaped as &amp;, due to the content of doc effectively being parsed > twice - once as the content of the attribute, and then again to parse > the string as a document. Why "especially for the doc proposal"? The ampersand problem seems the same for any markup-in-attribute proposal, and doc has far fewer escaping problems than the data: alternative. Presumably almost nobody is ever going to write the markup by hand, since the point is to embed untrusted content in a sandbox, and if you're embedding it by hand you can verify the content visually and don't need to sandbox it. So the important thing is how server-side code will do the escaping. If you have a (Perl) script which does something like print "<iframe sandbox doc=\"$doc\">"; you'll have to escape with something like s/"/"/g; in order to avoid security vulnerabilities, and also with s/&/&/g; in order to get correct processing. If you instead had print "<iframe sandbox src=\"data:text/html;charset=utf-8,$doc\">"; you'd still just have to escape " for safety; but for correct processing in current browsers you'd have to at least escape & and do s/%/%25/g; s/#/%23/g; (are there any others you need?) and for validity I think you'd have to instead do s/([^;\/?:@&=+$,a-zA-Z0-9-_.!~*'()])/join "", map { sprintf "%%%02x", $_ } unpack "C*", encode("utf-8", $1)/eg; (if I interpret RFC2397's reference to RFC2396's "urlchar" as actually meaning "uric", and if I haven't made stupid mistakes). Your server-side script probably already has access to an HTML escape function that will do what's needed for <iframe doc>, and if you have a decent templating system it'll do it automatically. It's no different to any other form of embedding content from the user, so it doesn't seem an unreasonable burden. (Escaping data: correctly is a lot more complex and a lot less likely to be provided as a function in your server environment.) -- Philip Taylor pjt47@cam.ac.uk
Received on Sunday, 17 January 2010 11:57:10 UTC