[whatwg] Content Restrictions from Alexey Feldgendler on 2006-03-02 (public-whatwg-archive@w3.org from March 2006)

From: Alexey Feldgendler <alexey@feldgendler.ru>
Date: Thu, 02 Mar 2006 20:30:23 +0600
Message-ID: <op.s5sikxgn1h6og4@pancake.feldgendler.ru>
On Tue, 21 Feb 2006 10:31:51 +0600, Hallvord Reiar Michaelsen Steen  
<hallvord at hallvord.com> wrote:

>> What is or what isn't technically simple to implement in existing
>> implementations should perhaps not be what decides how specifications  
>> are
>> written.  It is clear that it is possible to implement per-function
>> security tracking (though slightly unclear how such security tracking
>> should work; which of all currently executing functions determine the
>> security context?)

Only the innermost one does. I've posted the exact rules a couple of weeks  
ago.

>>  It is also clear that it hasn't been exactly been required by  
>> implementations
>> yet, so it is likely that an implementation doesn't have it already.   
>> And since
>> it involves storing more information, implementing it is likely to cost  
>> some
>> in terms of memory use.

In Gecko, as far as I can see from its source code, it doesn't add memory  
overhead. It already has origin tracking of some kind (used to implement  
the today's usual security restrictions).

>> why doesn't the author simply make
>> sure to serve the untrusted content from another server (with another
>> host name or port number, that is, not necessarily another machine)?

This is what LiveJournal does now. However:
1. For many small sites it's not an option,
2. It doesn't solve the problem of untrusted JS included by a page.

>> Seems that brings another (although simpler) set of problems though:
>> what if the untrusted content contains a "</SANDBOX>" tag, or if it
>> ends with "<!--", or possibly other syntax anomalies?

I never said that the website won't have to do HTML cleaning for  
user-supplied content. But with HTML 5 reference parsing algorithm, such  
cleaning is going to be much easier and straightforward: parse the text  
into DOM (as if it was inside BODY, for example), remove or modify  
forbidden elements, then serialize it. That way, </SANDBOX> will be  
ignored as an easy parse error because it doesn't match an opening tag  
within the user-supplied text. An unclosed comment will be ignored, too.

>> What if it doesn't contain exactly that, but something else that
>> triggers equivalent behaviour in the HTML parser in some implementation?
>> HTML parser are traditionally quite complex, and quite "fuzzy".  The
>> fuzziness hasn't been a security problem before, now all of a sudden it
>> might be.

HTML 5 will make HTML parsing in standards mode well-defined, with  
predictable error recovery.

> Did we discuss how the UA should handle a closing </sandbox> tag?
> Would it need to scan forward in the markup to find other closing
> tags and determine if the current one is a part of the enclosed
> markup or the end of the SANDBOX in that page? Perhaps only the first
> and the last SANDBOX open/close tags can be taken into account and
> others discarded?

No need to do that. SANDBOX elements can be nested like many others.  
Nevertheless, a </SANDBOX> tag without a matching opening tag inside the  
user-supplied content will be ignored during the HTML cleanup process  
described above.

There is one more such case: when </SANDBOX> is injected using  
document.write("</SANDBOX>"), but that can be easily circumvented.


-- 
Opera M2 8.5 on Debian Linux 2.6.12-1-k7
* Origin: X-Man's Station [ICQ: 115226275] <alexey at feldgendler.ru>
Received on Thursday, 2 March 2006 06:30:23 UTC