Re: Upgrade mixed content URLs through HTTP header from Devdatta Akhawe on 2015-02-06 (public-webappsec@w3.org from February 2015)

From: Devdatta Akhawe <dev.akhawe@gmail.com>
Date: Thu, 5 Feb 2015 20:34:22 -0800
To: Mike West <mkwst@google.com>
Cc: Emily Stark <estark@google.com>, Jim Manico <jim.manico@owasp.org>, Ryan Sleevi <sleevi@google.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>, Anne van Kesteren <annevk@annevk.nl>, Adam Langley <agl@google.com>
Message-ID: <CAPfop_1EuB+iaUC+poXY1t7C4Um6_k70nv-kqmZP+Upt8uFzLA@mail.gmail.com>
Hey

This is a great thread and something I have wanted for a while
(https://bugzilla.mozilla.org/show_bug.cgi?id=776278).

I just want to note that I don't believe CSP reporting in its current
state is a viable alternative for gathering the requisite telemetry.
On deploying CSP, the vast majority of reports you see are extension
noise and it takes a lot of work to really be able to clean up the
noise and figure out whats broken. All the engineers who have deployed
CSP have shared similar stories of the crazy amounts of noise. I have
serious doubts this is practical for helping with HTTPS deployment.

Have we considered Service Worker for this? It seems easy to use
Service Workers to log HTTP requests or try upgrading to HTTPS. This
allows the origin to remain in control too.

cheers
Dev

On 4 February 2015 at 07:36, Mike West <mkwst@google.com> wrote:
> Hi, Emily! Sorry I missed this email last night and this morning. :/
>
> Though you retracted it later, I think it's worth touching on some of the
> points you raise regardless. :)
>
> On Tue, Feb 3, 2015 at 5:20 PM, Emily Stark <estark@google.com> wrote:
>>
>> 1. It would be difficult to write Javascript that actually does this
>> rewriting correctly and reliably. For example, it's not clear to me that a
>> Javascript library would be able to rewrite URLs of images that are inserted
>> directly into the DOM dynamically.
>
>
> It's not clear to me that we actually have hooks that would allow you to
> make these changes in JavaScript. We prefetch pretty agressively by scanning
> ahead in the document while blocked (on script loading, for example).
> MutationObservers almost certainly fire too late to deal with DOM
> injections, and I don't think we have anything that could prevent code like
> the following from loading the insecure image:
>
>     var img = document.createElement('img');
>     img.src = "http://example.com/image.png";
>
> The image is never inserted into the DOM, so we have no chance of
> intercepting the request before it leaves the context of the page.
>
>>
>> 2. Even if such a magical Javascript library existed, it would be
>> difficult to deploy on huge sites with thousands of pages: the goal is to
>> avoid having to make any changes to the source of such sites. [I'm skeptical
>> of this point -- I see that actually transforming tons of URLs on all the
>> thousands of pages could be cumbersome, but adding a Javascript snippet to
>> the head of every page would probably just be a matter of modifying a small
>> number of templates, right?]
>
>
> This is the big one, I think. It's certainly not the case that it's always
> as simple as modifying a small number of templates. You're assuming a lot
> about the modernity of a site's CMS by even asserting that it _has_
> templates. :)
>
> Anecdotally, I used to work at a newspaper that had an old CMS. No one
> wanted to touch the old CMS, at least partially because the templates and
> the content flowed into each other almost arbitrarily. They ended up
> rewriting everything shortly before I left in order to achieve some
> semblance of trivialness to updates.
>
> Truly legacy content like archive.org or the BBC's archive
> http://www.bbc.co.uk/archive/ are also quite a bit simpler to modify via
> headers than via JavaScript.
>
>> 3. Even if 1 and 2 were easy and possible, the goal is to load resources
>> over HTTPS *without* actually rewriting URLs, because a script on the page
>> might assume that a URL is http:// and would break if the URL was actually
>> rewritten to https://. [I'm either skeptical of or ignorant on this point --
>> anyone have examples of why such breakage would come up in real life?]
>
>
> I don't think this has come up as a justification. I don't think we'd want
> to pretend that the URL was the same, really. Style, for instance, should
> resolve URLs relative to the HTTPS file that was actually loaded, not the
> HTTP file that was hard-coded into the HTML document.
>
>>
>> The major advantage that I see of rewriting in Javascript is flexibility:
>> for example, nytimes.com could rewrite all images on nytimes-owned domains
>> to https://, while only reporting for images on other domains. On the other
>> hand, having the header doesn't stop nytimes from implementing this policy;
>> it could do whatever rewriting it wants in JS, and use the header just to
>> report insecure requests.
>
>
> If this is a requirement (and other folks like Peter have raised it as
> well), we can set up a whitelisting mechanism, similar to what CSP does for
> `*-src` directives. I'm hopeful that this is just a nice-to-have, as it's
> significantly more complex, both to implement and reason about. :)
>
> Note also that CSP triggers JavaScript errors, meaning that you can pretty
> easily layer your own reporting system on top of those events, and only send
> something back home for a certain subset of violations. See
> https://w3c.github.io/webappsec/specs/content-security-policy/#securitypolicyviolationevent-interface
> for details.
>
> -mike
>
> --
> Mike West <mkwst@google.com>, @mikewest
>
> Google Germany GmbH, Dienerstrasse 12, 80331 München, Germany,
> Registergericht und -nummer: Hamburg, HRB 86891, Sitz der Gesellschaft:
> Hamburg, Geschäftsführer: Graham Law, Christine Elizabeth Flores
> (Sorry; I'm legally required to add this exciting detail to emails. Bleh.)
Received on Friday, 6 February 2015 04:35:10 UTC