- From: David Woolley <forums@david-woolley.me.uk>
- Date: Thu, 27 Sep 2007 22:28:05 +0100
- CC: www-html@w3.org
Lincoln Yeoh wrote: > > > Any suggestions for doing a similar thing without violating the existing > language model that will actually work and be fairly simple to implement? Firstly, the situation is worse than I originally thought. If you cannot trust the browser to get the basic parsing right in safe mode, you will have to recognize the end marker at the lexical level, to prevent its being spoofed within a parameter string or comment and to prevent an attack on the matrix content by leaving such a structure open around your marker. You will also have to do so only at lexical level zero, to prevent it being written by scripting. I would say the real solution to your requirement is to use object elements to contain the unsafe HTML, they can then provide a very clear security boundary. The browser content origin rules can then provide the protection. I suspect support for HTML objects is currently poor, but support for your proposal is currently non-existent. Beyond that, I'm not sure that the problem really is amenable to technical solutions. The browser vendors will always be trying to introduce cool new features, and it will often be the case that the security implications are not realised at the time they are introduced (and, if there are restrictions placed on third party inclusions, you can expect the vendors to invent cool new mechanisms to get back a lot of the capability lost by the lock down - note for instance how programs like Skype effectively subvert firewalls). Many current exploits are actually the result of things that are supposed to be locked down (e.g. marked safe for scripting) that are not actually safe. Vendors will still be operating under incentives to lock down as little as possible. If you want to try, I think there is a responsibility on the content management system to canonicalise the HTML, and remove all unknown attributes. That way, the user should not be able to trick the browser's parser. There are HTML parsing libraries, even if many content management systems use very shallow, tag soup, type parsing. Once you have done that, all you really need to do is insist that safe mode always disables scripting. If you don't disable scripting, you will have to restrict document.write and document object model changes, as these frustrate static safety analysis. Scripting is, in any case, a co factor in most current exploits, so it would have to be locked down hard anyway. Your unstructured approach requires special consideration for scripting, as, by default, the script could read your random number and document.write the closing marker. To a large extent, scripting can be stripped by simply not including the relevant attributes and elements in the canonicalisation stage, and that will work for all browser, not those that arrive 5 to 10 years from now. The only problem is if a vendor comes up with some hack like the javascript: pseudo URL scheme, which is a magic interpretation of some existing attribute value. Initially only the introducing browser can police that. Another problem that you may have is that authors of the third party content may well convince users that browsers that implement your mechanism are broken, because they don't render their material as well as less restricted browsers. This has always been the problem in getting browsers to follow standards. If you are sure that you want something that violates element structure, but are confident that you can protect the lexical structure, processing instructions may be more appropriate, but I'm not really sure that that is their true purpose. -- David Woolley Emails are not formal business letters, whatever businesses may want. RFC1855 says there should be an address here, but, in a world of spam, that is no longer good advice, as archive address hiding may not work.
Received on Thursday, 27 September 2007 21:28:40 UTC