Re: Privacy properties of a cookie replacement (was "Re: Some half-baked thoughts about cookies.")

On Fri, Aug 17, 2018 at 12:52 PM Stephen Farrell <stephen.farrell@cs.tcd.ie>
wrote:

>
> Hiya,
>
> (Sorry for the slow response...)
>

Sorry for the even-slower response. I should have learned by now not to
throw interesting ideas out to a mailing list just before leaving on
vacation. That said, the beach was lovely, so I don't have too many
regrets... :)


> On 16/08/18 09:56, Mike West wrote:
> > Hey Stephen!
> >
> ...
> >> What I'm asking is that, if doing this, we aim for a real
> >> improvement in privacy too, and include relevant actors and
> >> incentives in the analysis. We might fail to meet that goal
> >> of course, but I reckon we really ought try.
> >>
> >
> > I agree that it would be helpful to spell out the axes along which we
> think
> > cookies need improvement from a privacy perspective. In this thread, I've
> > talked about the ways in which this proposal impacts two privacy-relevant
> > aspects of cookies:
> >
> > 1.  They are potentially delivered in plaintext.
> > 2.  They enable third-party tracking.
> >
> > I think this proposal has significantly positive impact on the first
> > insofar as it prevents plaintext delivery, and minorly positive impact on
> > the second insofar as it requires an initial same-site request in order
> to
> > enable subsequent cross-site delivery.
>
> Seems about right (though I'm not sure I yet get the cross-site
> details but that's ok for now).
>

#3 in
https://github.com/mikewest/http-state-tokens#does-this-proposal-constitute-a-material-change-in-privacy-properties
copy/pastes some words from earlier in these threads. I'll attempt to
explain it more clearly if I get to the point of writing out a more formal
spec for this proposal.

> For instance, you
> > mentioned incentives for UAs and servers to behave in more
> privacy-friendly
> > ways: did you have anything in particular in mind?
>
> Fair question.
>
> What I'd like (but don't know how to get) is something like:
>
> - an HTTP state mechanism that is not likely to be used when
>   not needed for managing state
> - stateless operation to become a more normal thing, e.g. I'd
>   like stateless operation to work more often/better when
>   browsing sites with which I don't have a relationship
>

These two seem to be inextricably bound together, as it seems likely to me
that intentionally stateless operations on the web today are few and far
between. Basically everything interesting we do on the web today requires
state of some kind to bind HTTP requests together, even when that state
isn't explicitly user-focused (see the discussion of routing metadata Willy
raised earlier in
https://lists.w3.org/Archives/Public/ietf-http-wg/2018JulSep/0203.html).

That said, it seems like many sites that don't require an authenticated
session do _work_ without cookies, and it seems like user agents could give
users more control over when they'd like to be in that explicitly
unauthenticated state. I'm not sure what impact that would have on the
technical mechanism by which the user agent informs the server about the
user agent's state, however. It seems like we'd still want a session
identifier of some sort for the times when a user is visiting a site in a
stateful way.


> - a mechanism that is harder to abuse, and where abuse could
>   be detected (possibly that last would need an auditor or
>   researcher to figure out clever ways to spot abuse)
>

What's abuse? I'm sure we all have pervasive monitoring in mind, and I'm
sure many have advertising/tracking in mind as well. I don't think we have
a good, shared grasp on where the line is, however. Re-identifying a user
over time as they make subsequent connections to an origin might be a
perfectly reasonable thing to do! It might also be a terrible thing to do,
depending on context. I'd like to understand how


> - and a pony :-)
>
> I think there could be incentives for servers, if the above
> was more efficient than the status quo ante, and if they
> ended up with higher quality information about the user
> agents that actually matter to them, and didn't have to
> deal with the chaff from every UA that ever made a connection.
> (And the potential costs if that chaff leaks due to some
> incident in a server or partner site.)
>
> Similarly, I'd hope reducing stateful operation could be
> viewed as a positive by UAs, if it allowed them to render
> pages more quickly in enough cases. i.e. if the servers
> were incented to play this kind of game for enough content,
> then maybe it'd also be attractive to UAs.
>

Performance seems tangential to the underlying questions here. Anecdotally,
web pages are slow because developers deliver megabytes of JavaScript with
every page load. Shifting that incentive is critical to the web's success,
and seems to utterly swamp the benefits of reducing the connection overhead
that things like cookies have introduced.

I think there's an argument to be made for "stateless" connections from a
privacy perspective, but I think it's unlikely to be justifiable from a
performance perspective.


> I'm not sure how that'd translate into something that
> could be used, but perhaps something like: default to not
> send any identifier on 1st contact, to start sending short
> identifiers when needed (making 'em longer as needed)


These certainly seem like reasonable places to compromise.

with those identifiers cycled by default (e.g. for new
> TLS sessions) with some way that a server who needs to
> could spend a bit of effort that'd allow them to know
> that a chain of identifiers are from the same UA. (That
> starts to sound a bit like the quic connection ID
> discussion though.)
>

It would certainly be possible to introduce some sort of rotation scheme.
What benefit would that provide, with regard to the privacy properties
discussed above? Surely a server with interest (financial or otherwise) in
re-identifying a user over time would do the work to support whatever
rotation scheme we introduced? Is the mitigation aimed against
unintentional re-identification, rather than the abuse cases?

If the upshot of all that was that real HTTP state
> management was less likely to be used for tracking and
> that those wanting to track were left with cookies, and
> if turning off cookies worked well more often, then I
> guess that'd be a win.
>

I'm pretty explicitly aiming towards something that could replace cookies
in the long run. My hope is that by introducing something with properties
the user agent more explicitly controls, we'll be able to collectively
start more aggressively ratcheting down on cookies' behaviors in the medium
term. Even if all that we can agree upon is that cookies are bad, that
seems like an outcome that we could both celebrate.

Thanks!

-mike

Received on Monday, 27 August 2018 09:02:19 UTC