W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2008

[whatwg] Sandboxing to accommodate user generated content.

From: Kristof Zelechovski <giecrilj@stegny.2a.pl>
Date: Tue, 17 Jun 2008 18:18:55 +0200
Message-ID: <C56783AC6D164ED38B40AB2992ABBB66@POCZTOWIEC>
1.  Please elaborate how an extension of CSS would require a sanitizer
update.
2.  Please explain why using a dedicated tag with double parsing is easier
for a Web developer than putting the code in an attribute.
3.  Your quoting solution would not cause legacy browsers to show plain
text; they would show HTML code, which is probably much worse than showing
plain text.  If you mean JavaScript can be used to extract plain text, I
doubt it will be simple; there are probably lots of junctions where this
procedure can derail.
4.  Please explain why you consider network efficiency for legacy user
agents essential.  I believe that the correlation between efficiency and
compatibility is negative in general.  If that causes server overload, the
server can discriminate content depending on the user agent; this is a
temporary solution to an edge case and it should probably be acceptable.
Besides, a blog page with 60 comments on it is rather hard to render and
read so you should probably consider other display options in this case.
5.  I am not sure why IFRAME content must be HTTP-secured if the containing
page is.  The specification does not impose such a restriction AFAIK.  And,
if you need to go secure, do not allow scribbling in the first place, right?
Please take these points as a challenge, not as an attempt to let you down.
I personally think your idea is worth considering.
Chris

-----Original Message-----
From: whatwg-bounces@lists.whatwg.org
[mailto:whatwg-bounces at lists.whatwg.org] On Behalf Of Frode Borli
Sent: Tuesday, June 17, 2008 3:05 PM
To: whatwg at lists.whatwg.org
Subject: Re: [whatwg] Sandboxing to accommodate user generated content.

I have been reading up on past discussions on sandboxing content, and
I feel that it is generally agreed on that there should be some
mechanism for marking content as "user generated". The discussion
mainly appears to be focused on implementation. Please read my
implementation notes at the end of this message on how we can include
this function safely for both HTML 4 and HTML 5 browsers, and still
allow HTML 4 browsers to function properly.


My main arguments for having this feature (in one form or another) in
the browser is:

- It is future proof. Changes to browsers (for example adding
expression support to css) will never again require old sanitizers to
be updated.
- It does not require much skill and effort from the web developer to
safely sanitize user content.
- Security bugs are fixed by browser vendors, and not by each web developer.


In the discussions I find that backward compatability is absolutely
the most important issue. Second is that it must be easy for web
developers to use the features.

The suggested solution of using an attribute on an <iframe> element
for storing the user generated content has several problems;

1: The use of src= as a fallback means that style information will be
lost and stylesheets must be loaded again.

2: The use of src= yields problems with iframe heights (since the
src-url must be hosted on another server javascript cannot fix this)
and HTML 4 browsers have no other method of adjusting the iframe
height according to the content.

3: If you have a page that lists 60 comments on a blog, then the user
agent would have to contact the server 60 times to fetch each comment.
This again means that perl/php scripts have to be invoked 60 times for
one page view - that is 61 separate database connections and session
initializations.

4: For the fallback method of using src= for HTML 4 browsers to
actually work, the fallback documents must be hosted on a separate
domain name. This again means that a website using HTTPS must purchase
and maintain two certificates.

I do not believe this solution will ever be used.


My solution:

If we add a new element <htmlarea></htmlarea>, old browsers will run
scripts, while new browsers will stop scripts and this is a major
problem.

If HTML 5 browsers require everything between <htmlarea></htmlarea> to
be html entity escaped, that is < and > must be replaced with &lt; and
&gt; respectively. If this is not done, HTML 5 browsers will issue a
severe warning and refuse to display the page. Developers will quickly
learn.

HTML 4 browsers will never run scripts (since it will only see plain
text). HTML 5 browsers will display rich text. It would be completely
secure for both HTML 4 and HTML 5 browsers.

A simple Javascript could clean up the HTML markup for HTML 4 browsers..


>  I believe the idea to deal with this is to add another attribute to
<iframe>, besides sandbox="" and seamless="" we already have for sandboxing.
This attribute, doc="", would take a string of markup where you would only
need to escape the quotation character used (so either ' or "). The fallback
for legacy user agents would be the src="" attribute.

-- 
Best regards / Med vennlig hilsen
Frode Borli
Seria.no

Mobile:
+47 406 16 637
Company:
+47 216 90 000
Fax:
+47 216 91 000


Think about the environment. Do not print this e-mail unless you really need
to.

Tenk miljo. Ikke skriv ut denne e-posten dersom det ikke er nodvendig.
Received on Tuesday, 17 June 2008 09:18:55 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:03 UTC