Re: Upgrade mixed content URLs through HTTP header

Hi, Emily! Sorry I missed this email last night and this morning. :/

Though you retracted it later, I think it's worth touching on some of the
points you raise regardless. :)

On Tue, Feb 3, 2015 at 5:20 PM, Emily Stark <estark@google.com> wrote:
>
> 1. It would be difficult to write Javascript that actually does this
> rewriting correctly and reliably. For example, it's not clear to me that a
> Javascript library would be able to rewrite URLs of images that are
> inserted directly into the DOM dynamically.
>

It's not clear to me that we actually have hooks that would allow you to
make these changes in JavaScript. We prefetch pretty agressively by
scanning ahead in the document while blocked (on script loading, for
example). MutationObservers almost certainly fire too late to deal with DOM
injections, and I don't think we have anything that could prevent code like
the following from loading the insecure image:

    var img = document.createElement('img');
    img.src = "http://example.com/image.png";

The image is never inserted into the DOM, so we have no chance of
intercepting the request before it leaves the context of the page.


> 2. Even if such a magical Javascript library existed, it would be
> difficult to deploy on huge sites with thousands of pages: the goal is to
> avoid having to make any changes to the source of such sites. [I'm
> skeptical of this point -- I see that actually transforming tons of URLs on
> all the thousands of pages could be cumbersome, but adding a Javascript
> snippet to the head of every page would probably just be a matter of
> modifying a small number of templates, right?]
>

This is the big one, I think. It's certainly not the case that it's always
as simple as modifying a small number of templates. You're assuming a lot
about the modernity of a site's CMS by even asserting that it _has_
templates. :)

Anecdotally, I used to work at a newspaper that had an old CMS. No one
wanted to touch the old CMS, at least partially because the templates and
the content flowed into each other almost arbitrarily. They ended up
rewriting everything shortly before I left in order to achieve some
semblance of trivialness to updates.

Truly legacy content like archive.org or the BBC's archive
http://www.bbc.co.uk/archive/ are also quite a bit simpler to modify via
headers than via JavaScript.

3. Even if 1 and 2 were easy and possible, the goal is to load resources
> over HTTPS *without* actually rewriting URLs, because a script on the page
> might assume that a URL is http:// and would break if the URL was
> actually rewritten to https://. [I'm either skeptical of or ignorant on
> this point -- anyone have examples of why such breakage would come up in
> real life?]
>

I don't think this has come up as a justification. I don't think we'd want
to pretend that the URL was the same, really. Style, for instance, should
resolve URLs relative to the HTTPS file that was actually loaded, not the
HTTP file that was hard-coded into the HTML document.


> The major advantage that I see of rewriting in Javascript is flexibility:
> for example, nytimes.com could rewrite all images on nytimes-owned
> domains to https://, while only reporting for images on other domains. On
> the other hand, having the header doesn't stop nytimes from implementing
> this policy; it could do whatever rewriting it wants in JS, and use the
> header just to report insecure requests.
>

If this is a requirement (and other folks like Peter have raised it as
well), we can set up a whitelisting mechanism, similar to what CSP does for
`*-src` directives. I'm hopeful that this is just a nice-to-have, as it's
significantly more complex, both to implement and reason about. :)

Note also that CSP triggers JavaScript errors, meaning that you can pretty
easily layer your own reporting system on top of those events, and only
send something back home for a certain subset of violations. See
https://w3c.github.io/webappsec/specs/content-security-policy/#securitypolicyviolationevent-interface
for details.

-mike

--
Mike West <mkwst@google.com>, @mikewest

Google Germany GmbH, Dienerstrasse 12, 80331 München,
Germany, Registergericht und -nummer: Hamburg, HRB 86891, Sitz der
Gesellschaft: Hamburg, Geschäftsführer: Graham Law, Christine Elizabeth
Flores
(Sorry; I'm legally required to add this exciting detail to emails. Bleh.)

Received on Wednesday, 4 February 2015 15:37:02 UTC