Re: Establishing consistent opt-ins to expose resource metadata from Artur Janc on 2020-07-20 (public-webappsec@w3.org from July 2020)

From: Artur Janc <aaj@google.com>
Date: Mon, 20 Jul 2020 22:46:40 +0200
To: Noam Rosenthal <noam.j.rosenthal@gmail.com>
Cc: Anne van Kesteren <annevk@annevk.nl>, Yoav Weiss <yoavweiss@google.com>, Camille Lamy <clamy@google.com>, Nasko Oskov <nasko@google.com>, Ilya Grigorik <igrigorik@google.com>, Mike West <mkwst@google.com>, Tab Atkins <tabatkins@google.com>, Kinuko Yasuda <kinuko@google.com>, Ulan Degenbaev <ulan@google.com>, WebAppSec WG <public-webappsec@w3.org>, public-web-perf <public-web-perf@w3.org>
Message-ID: <CAPYVjqpdwqPHmkFGK90gtgtZMZ+rUG3QbvCQ4k6Og30vAhUCcA@mail.gmail.com>

On Mon, Jul 20, 2020 at 12:37 PM Noam Rosenthal <noam.j.rosenthal@gmail.com>
wrote:

> On Mon, Jul 20, 2020 at 12:56 PM Anne van Kesteren <annevk@annevk.nl>
> wrote:
>
>> High-level question: are metadata and data distinct enough and can
>> developers-at-large reason about their difference to make the right
>> trade-offs? At least in terms of surveillance, metadata can tell a
>> pretty damning story as we've come to learn and I know the
>> network-security folks are trying their best not to give any bits to
>> the network, e.g.,
>> https://blog.apnic.net/2018/03/28/just-one-quic-bit/. I worry a bit
>> that what we're doing here isn't exactly sound from an
>> information-security perspective.
>>
>
> I think this indeed becomes a problem when metadata is an umbrella term
> that defines an undetermined set of properties.
> For example, using a single header with a catch-all * wildcard to describe
> all sorts of separate types of metadata might create issues in the future,
> where as more types of metadata are added a server exposes 'metadata' that
> they didn't intend to expose because "metadata" catches them.
>

This is a good point -- it's definitely difficult for developers to
understand the impact of revealing a particular bit of metadata about a
resource. One example is a discussion we had around `Timing-Allow-Origin`
which, among other things, exposes information about the DNS and connection
timings. This tells an attacker that a user may have had an existing
connection to a given origin, which is origin-level information that is
completely unrelated to the individual resource which sets the header, but
it still gets implicitly exposed (at least without double-keying of network
connections).

At the same time it seems useful to give developers ways to do this without
requiring them to opt into full CORS because, practically, there are many
situations where it seems acceptable to reveal metadata without revealing
the resource bytes. For example, a user avatar at <
https://social.example/me.png> may be okay with exposing its metadata
(because it could be identical for all users), but not to expose the image
data via credentialed CORS.

2. Use the presence of CORP as a signal that (some) metadata about the
>> resource can be revealed.
>
>
> I don't like this. In the face of a Spectre-read gadget, CORP "equals
> CORS", but only then. Google's suggestion to switch from CORS to CORP
> for cross-origin isolated was good I think and gave me renewed hope
> that we'll eventually get rid of Specre either through hardware/kernel
> or changes in browser architecture.

I think it's important to look at this from the author's perspective; in
the medium term, developers must assume that some of their users will have
configurations that are vulnerable to Spectre, and that by allowing a
resource to end up in an attacker origin's address space, the resource can
leak. So from a browser's point of view, it may be susceptible to a
Spectre-read gadget or not, but from the web application's point of view
you need to assume that some users will be vulnerable, and only set CORP
`cross-origin` on resources which you're comfortable returning in a
cross-origin context.

So *if* CORP means "this is a resource which I'm okay potentially exposing
to attackers, but I don't want to provide direct access to it via CORS"
*and* we consider metadata to be less sensitive than the contents of the
resource, it doesn't seem like a huge stretch to treat CORP as an opt-in to
revealing resource metadata. But I certainly agree that it's controversial
and could be a footgun, especially if it gave blanket permission to all
metadata, as Noam wrote above.

As an aside, I like the current formulation of CORP partly because it's
somewhat underdefined; the header tells you what can happen ("this
requester can / cannot access the resource") but doesn't imply anything
else about the resource or its security properties. In a lot of cases, this
granularity is perfectly fine, because I'd guess that the bulk of resources
loaded in no-cors mode are static or otherwise not sensitive; attaching a
security model to this would overcomplicate things. Individual switches
will be simpler conceptually, but ironically may be harder to reason about
than "is this a static/boring resource or not?".

> This is also the case for CORP - without some granularity around which
> information is exposable, there's a risk of over-exposing by mistake.
>
> So I think with either option there should not be a wildcard for selecting
> types of metadata. And then, when the properties/types (e.g.
> orientation/resolution/pixels of an image) are explicitly defined, it
> doesn't matter if they're data or metadata.
>

I wonder if we could have a tiered embedding model where a more powerful
mechanism implies all the capabilities of the less powerful ones. As a
strawman, ignoring the reasonable concerns about CORP from above:
CORS >> CORP >> [header(s) to expose individual bits of metadata] >>
[regular no-cors loads without any opt-ins]

In this model, if a resource can be loaded via CORS or opts into CORP, it
would also let the embedder access resource metadata. Such a model would be
"uni-directional" and allow developers to expose resource metadata without
requiring them to set CORP.

This is likely a bad idea because it's making assumptions that aren't
necessarily true (for example that a resource loaded via CORS should reveal
fine-grained timing metadata). But if *something* along these lines was
workable, it could make the developer story simpler; e.g. if you have a
resource that isn't sensitive and allow embedding it via CORS/CORP, that
would save developers from setting a bunch of different headers for
different kinds of metadata.

But this runs into the concerns mentioned above and I imagine Anne can find
some flaws in this logic :)

Received on Monday, 20 July 2020 20:47:06 UTC