Re: Issue 100 Zero-Edits Counter Proposal from Tab Atkins Jr. on 2010-04-13 (public-html@w3.org from April 2010)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Tue, 13 Apr 2010 08:33:24 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: public-html@w3.org
Message-ID: <z2rdd0fbad1004130833n133fa8f7k7f1789b23a003c2d@mail.gmail.com>
On Tue, Apr 13, 2010 at 1:53 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Does that mean that you support the general direction of the change proposal
> submitted for issue 103?

I don't see anything particularly wrong with it.  I don't have the
expertise to know how correct it is.

I'd like to see a breakdown of which of the additional characters are
necessary to escape for security reasons, and which are necessary to
escape just to maintain the content of the included page, though.

> Anyway, what should be mentioned in this context is that putting markup into
> attribute values is maybe the #1 anti-pattern in vocabulary design. It
> should be avoided. Really. To add something like this for the first time in
> HTML history just to add a feature where there's not even agreement that the
> feature is useful is very very strange.

I agree.  In general, markup in attributes *is* a very bad idea.  It's
the least bad solution to this problem, though.  Anything involving
putting the content into the page normally involves either (a) much
more complex escaping requirements (you need to count the occurrences
of <iframe> and </iframe> in the content, and escape any unbalanced
</iframe>s) or (b) the author successfully managing crypto-type things
(the idea of putting in sufficiently unique and random tokens to
delimit the start and end).

Attributes happen to have the most minor escaping requirements of all.
 As well, this is *never* intended to be hand-written.  If you're
hand-writing something, you implicitly trust it, and can just write it
into the page.  This feature is designed *exclusively* to be
machine-generated from stored user-generated content and similar.

I feel it's very important to note, though, that data: url suffers
from the exact same "markup in attributes" problem, just with slightly
worse technical requirements.  Anyone who objects to @srcdoc because
it is "markup in attributes" but then suggests using data: urls would
be confused, at best.


> I think that's a very weak argument. I'm pretty sure that any feature we add
> can be used wrongly. What's much more important is that the languages allow
> to use it right, which appears to be the case. That being said, it even
> wouldn't be a problem if new libraries would be needed - after all we're
> talking about something which can be deployed only very slowly.

True.  The main thrust of the problem, though, is that the number of
security-important escaping requirements for data: urls is higher than
for @srcdoc, and aren't as easy to discover when you'd done them
wrong, unlike @srcdoc's single security-important escaping
requirement, which will fail very quickly and obviously in ordinary
content.


>> addition, despite both of these functions existing in PHP, multiple
>> homebrew url-escaping functions can be found across the web, which may
>> not escape everything that is necessary to escape.  Some of these
>> lapses may result in non-obvious security holes that can be exploited
>> by attackers, allowing arbitrary code injection into a web page.
>> ...
>
> And these kind of security holes are impossible with @srcdoc?

Impossible?  No, of course not.  Much less likely, and much more
likely to be discovered by the page author when they do exist?  Yes.


>> 2. In legacy browsers, data: urls will "fail open"; that is, they will
>> display their contents even if the browser does not understand the
>> sandbox security model, potentially exposing users to attack.  This
>> can be mitigated by specifying a text/html-sandboxed mime type in the
>> data: url, however.
>
> So it's no problem, right?

Assuming that browsers understand the sandbox security model and
properly parse a text/html-sandboxed document, yes, it's not a direct
problem.  There is the minor issue of fallback, in that the failure
mode for text/html-sandboxed is simply to not work at all.  Since
data: urls will be specified in @src, there's nothing else the
<iframe> can possibly display, not even a notice to the user that
their browser doesn't have the correct features to safely view this
content.  @srcdoc can fall back to a text/html-sandboxed version of
its content, or it can fall back to a different @src document that
explains the problem.


>> ### @srcdoc is unneeded by the blogging community ###
>>
>> The creator of Wordpress, Matt Mullenweg, was asked about the need for
>> @srcdoc in the Wordpress software.  He responded that Wordpress
>> maintains a sanitation library that appears to work adequately.
>>
>> This is, again, not an argument against @srcdoc, it is an argument
>> against the sandbox security model.
>
> Well, it is an argument against both.

No, it's still not.  If the sandbox security model isn't needed, then
of course @srcdoc isn't needed, since its sole reason for existing is
to enable authors to shove user-generated content into the sandbox
without incurring extra network requests.  But problems with the
sandbox security model do *not* belong in a proposal to remove @srcdoc
only.  They are completely irrelevant to the question of keeping
@srcdoc itself, and serve only to confuse the matter and make it
appear as though there are more problems with @srcdoc than there truly
are.


> As far as I can tell, the opposition to @srcdoc in specific is stronger than
> iframe sandboxing in general due to the VERY controversial syntax.

I understand the basic rationale behind that.  That still doesn't make
it relevant to attack the sandbox security model in a change proposal
that has nothing to do with the sandbox security model.


> If we have indication that one of the important uses cases that were
> nominated won't be a use case in practice then we should think about both
> @srcdoc and the sandboxing feature, true. The outcome might be that we want
> to get rid of the questionable syntax, but keep sandboxing.
>
> However, this issue is just about @srcdoc, so let's focus on this.

Another possible outcome is that we improve the sandbox security model
to make it meet the use-case.

Note, though, that I specifically argued that Mullenweg's comment is
not necessarily indicative of the use-case being invalid in general.
Mullenweg represents a large organization with significant knowledge
and resources to combat attacks as they arise.  This doesn't help
authors using Wordpress that don't update their systems, though.  It
doesn't help authors who aren't using Wordpress at all.  It doesn't
help authors who potentially have the knowledge to extract Wordpress's
sanitation library, but are using something other than PHP on their
server.  Every separate sanitation library that exists is a new and
interesting source of bugs to exploit.  Solving a large portion of
this problem once, in the browser, can end up paying us back great
dividends in terms of website security.


>> ...
>> Negative
>> : As with all new elements and attributes, implementing this requires
>> effort from implementors.
>> ...
>
> Lots of effort. Essentially it requires re-parsing attribute content with an
> HTML5 parser. This is an architectural problem; for instance, an application
> may be consuming SAX events from an HTML5 parser, but may not have direct
> access to the parser engine at all.

Wouldn't a problem of this nature be shared by *any* content pointed
to by an <iframe>?

>From the chatter I've seen about implementing @srcdoc, actually
dealing with it isn't a huge issue.  Depending on the underlying
browser architecture, they may have to fake up a mock network resource
with the @srcdoc contents and then just pass that to whatever
machinery normally handles <iframe>s, or something like that.
Additional effort, sure, but nothing horrifying.  I haven't heard any
browser vendor directly object to @srcdoc due to implementation
difficulty.

~TJ
Received on Tuesday, 13 April 2010 15:34:23 UTC