- From: Martin Janecke <whatwg.org@kaor.in>
- Date: Thu, 26 Aug 2010 10:38:44 +0200
Am 26.08.10 01:41, schrieb Adam Barth: > On Wed, Aug 25, 2010 at 1:55 PM, Ian Hickson<ian at hixie.ch> wrote: >> On Wed, 25 Aug 2010, Adam Barth wrote: >>> HTML should support Base64-encoded entities to make it easier for >>> authors to include untrusted content in their documents without >>> risking XSS. >> >> Seems like a fine idea. Get browsers to implement it and I'll spec it. > > I've posted a patch for WebKit: > > https://bugs.webkit.org/show_bug.cgi?id=44641 > > Some subtleties: > > 1) Some base64 decoders tolerate newlines. We don't want to decode > entities with newlines. > 2) Decoding base64 results in binary data. We'll need to convert that > data to characters in order to deal with it in the DOM. We use always > use UTF8 for that transformation, regardless of the document's > encoding. > 3) Null characters are replaced with U+FFFD. > 4) The empty base64 entity&%; is consumed and is replaced with the > empty string. > 5) Invalid base64 is rejected and the entity is not decoded. > > Adam > Is it necessary to consider compatibility issues here? In HTML4 this seems to have been valid code (-> http://validator.w3.org/check): <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-type" content="text/html; charset=US-ASCII"> <title>base64 entity test</title> </head> <body> <p>Look at these fine ASCII characters: &%4oCT;</p> </body> </html> Now it would be interpreted differently. Could this lead to old documents changing in meaning? Do we have to consider old documents that were not completely valid (e.g. lacked a doctype declaration)?
Received on Thursday, 26 August 2010 01:38:44 UTC