W3C home > Mailing lists > Public > www-html@w3.org > September 2007

Re: Tag to disable unwanted features?

From: David Woolley <forums@david-woolley.me.uk>
Date: Thu, 27 Sep 2007 22:28:05 +0100
Message-ID: <46FC2065.8040306@david-woolley.me.uk>
CC: www-html@w3.org

Lincoln Yeoh wrote:
> 
> 
> Any suggestions for doing a similar thing without violating the existing 
> language model that will actually work and be fairly simple to implement?

Firstly, the situation is worse than I originally thought.  If you 
cannot trust the browser to get the basic parsing right in safe mode, 
you will have to recognize the end marker at the lexical level, to 
prevent its being spoofed within a parameter string or comment and to 
prevent an attack on the matrix content by leaving such a structure open 
around your marker.  You will also have to do so only at lexical level 
zero, to prevent it being written by scripting.

I would say the real solution to your requirement is to use object 
elements to contain the unsafe HTML, they can then provide a very clear 
security boundary.  The browser content origin rules can then provide 
the protection.  I suspect support for HTML objects is currently poor, 
but support for your proposal is currently non-existent.

Beyond that, I'm not sure that the problem really is amenable to 
technical solutions.  The browser vendors will always be trying to 
introduce cool new features, and it will often be the case that the 
security implications are not realised at the time they are introduced 
(and, if there are restrictions placed on third party inclusions, you 
can expect the vendors to invent cool new mechanisms to get back a lot 
of the capability lost by the lock down - note for instance how programs 
like Skype effectively subvert firewalls).

Many current exploits are actually the result of things that are 
supposed to be locked down (e.g. marked safe for scripting) that are not 
actually safe.  Vendors will still be operating under incentives to lock 
down as little as possible.

If you want to try, I think there is a responsibility on the content 
management system to canonicalise the HTML, and remove all unknown 
attributes.  That way, the user should not be able to trick the 
browser's parser.  There are HTML parsing libraries, even if many 
content management systems use very shallow, tag soup, type parsing.

Once you have done that, all you really need to do is insist that safe 
mode always disables scripting.  If you don't disable scripting, you 
will have to restrict document.write and document object model changes, 
as these frustrate static safety analysis.  Scripting is, in any case, a 
co factor in most current exploits, so it would have to be locked down 
hard anyway.  Your unstructured approach requires special consideration 
for scripting, as, by default, the script could read your random number 
and document.write the closing marker.

To a large extent, scripting can be stripped by simply not including the 
relevant attributes and elements in the canonicalisation stage, and that 
will work for all browser, not those that arrive 5 to 10 years from now. 
  The only problem is if a vendor comes up with some hack like the 
javascript: pseudo URL scheme, which is a magic interpretation of some 
existing attribute value.  Initially only the introducing browser can 
police that.

Another problem that you may have is that authors of the third party 
content may well convince users that browsers that implement your 
mechanism are broken, because they don't render their material as well 
as less restricted browsers.  This has always been the problem in 
getting browsers to follow standards.

If you are sure that you want something that violates element structure, 
but are confident that you can protect the lexical structure, processing 
instructions may be more appropriate, but I'm not really sure that that 
is their true purpose.
-- 
David Woolley
Emails are not formal business letters, whatever businesses may want.
RFC1855 says there should be an address here, but, in a world of spam,
that is no longer good advice, as archive address hiding may not work.
Received on Thursday, 27 September 2007 21:28:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 10 December 2014 20:01:24 UTC