Re: 'persona', indicating 'private browsing mode' over the net from David Singer on 2015-03-03 (public-privacy@w3.org from January to March 2015)

From: David Singer <singer@apple.com>
Date: Tue, 03 Mar 2015 13:28:39 -0800
To: Joseph Lorenzo Hall <joe@cdt.org>
Cc: ifette@google.com, "public-privacy (W3C mailing list)" <public-privacy@w3.org>
Message-id: <54324E34-8BEC-4990-8164-5785C6AC2900@apple.com>
> On Mar 3, 2015, at 12:40 , Joseph Lorenzo Hall <joe@cdt.org> wrote:
> 
> On Mon, Mar 2, 2015 at 2:22 PM, Ian Fette (イアンフェッティ) <ifette@google.com> wrote:
>> Do you really want the same ID being sent to all sites? On the one hand
>> we're already spewing IP addresses everywhere and this can be used to do
>> retargeting and/or various data combination across sites, but now if you've
>> got a stable identifier (over the life of the browsing session, which could
>> be long) that actually seems like quite a privacy hit to me.
> 
> This is a really great point that I don't think we've seen raised yet
> in this discussion. David (Singer): would origin-scoped identifiers
> solve this problem or is the shared persona identifier a feature in
> your opinion?

Hi, sorry, got behind-hand

I wondered about it, of course.

The problems with scoped identifiers are (at least):
a) defining what they are scoped by.  ‘The user you think it is from some other information, if any’ is not very good standards-writing.
b) if it’s scoped by the machine, you can’t carry on searching for your SO’s birthday present from your phone (on the go) to your laptop (at home)

I get it that the UUID offers ‘perfect’ identification (well, a claim to be the same persona of the same person), but that only becomes a problem if you were trying not to reveal in the first place.

There is a conflict between asking the network “please respect the contexts and boundaries of this aspect of me” and “I’m trying to be anonymous”.  Indeed, it seems silly to say them both at the same time.


I’m not wedded to UUIDs, of course, if problem (a) can be solved.

> 
>> I've also not really see any notion of "multiple distinct browser sessions"
>> take off. Incognito / private mode enjoys some nontrivial use, but I'm still
>> amazed at how few people know it exists. The ability to have multiple
>> distinct profiles exists in Chrome and other browsers, but as much as we as
>> an industry try to push the notion, I can't say I've ever personally seen
>> anyone at an airport or cafe (aka not a Google or Apple office) actually
>> using this. I think the UI / change aversion / inertia present harder
>> problems than the technical problem of isolation within a profile.
> 
> I've heard grumblings that the notion of sessions altogether are
> getting a bit stale in terms of how people use browser UAs... that's a
> bit depressing to me (I use one browser locked down and then open
> things that need full cookies, JS, etc. in another browser that scrubs
> stuff on close (session end)). But I suspect Ian is very correct that
> making the distinction between nominal/private/persona interaction
> modes to users is going to be very very hard.

Perhaps.  But trivially if you turn on ‘private browsing but not anonymous/secret’ then minting a new persona for each session is easy.  Browsers could also allow you to open not just a ‘new private window’ but also ‘a new healthcare window’ or a ‘new birthday-shopping window’ where you have ‘saved personas’ called ‘healthcare’ and ‘birthday-shopping’.


> 
> best, Joe
> 
>> My $0.02
>> 
>> 2015-02-27 16:28 GMT-08:00 David Singer <singer@apple.com>:
>> 
>>> This is basically a mildly-edited re-statement of the ideas, taking into
>>> account some of the discussion. I was asked to re-post a summary, in the
>>> discussion this week at the call.
>>> 
>>> 
>>> * * * * * * * * * * * *
>>> 
>>> 
>>> The problem: quite a few browsers today have what they call “private
>>> browsing mode” or the like.  In this mode all local state that is
>>> accumulated is discarded at the end of the private browsing mode session
>>> (when the mode is turned off). After turning it off, the local machine has,
>>> ideally, no trace at all of what was done in the private mode. The discard
>>> includes browsing history, cookies, local storage etc.  I think that
>>> browsers can/do initialize the private session from the user’s current state
>>> when they start private mode.
>>> 
>>> Advantage: if it’s a shared computer, you don’t leave any trace.
>>> 
>>> So, private browsing sort-of-looks like this, in terms of state: two
>>> private sessions are started and then ended. These sessions are initialized
>>> from the base state, which is not updated while the private sessions are in
>>> process.
>>> 
>>> 
>>> 
>>> +[private 2] - - -
>>>                                     +[private 1] - - -          |
>>>                                     |                                  |
>>> [base state] - - - - - - - - + . . . . . . . . . . . . - - - -+ . . . . .
>>> . . . . . . .- - - - - - - - - -
>>> Time ->
>>> 
>>> This means that private browsing still ‘works’ on the web; cookies flow,
>>> referer headers, and so on, all as normal.  The important aspect of this is
>>> whether a trace is left on the ‘permanent history’.
>>> 
>>> Problem statement: the servers are completely unaware of this mode, and so
>>> any history etc. THEY keep is still visible.
>>> 
>>> Proposal:
>>> 
>>> The servers have various means to work out who this is, and attach history
>>> (these means include cookies, fingerprinting and so on). As noted above, we
>>> don’t seek to break normal browsing by refusing to accept storage etc. (e.g.
>>> of cookies), so a simple ‘binary’ signal in an HTTP header “I am trying to
>>> be private here” doesn’t help, as the server won’t know from request to
>>> request whether this is part of the same session or not.
>>> 
>>> Hence, the idea to introduce a header that identifies which ‘private
>>> session’ the user is in. Since, in fact, this can be used for other purposes
>>> than private browsing, and it’s logically possible for the browser to have
>>> multiple windows open, or separate sessions, or to return to a private
>>> session, we thought this was essentially an indication of what ‘aspect’ of
>>> the user that was being presented here, their persona.  So, we needed a
>>> session — persona — identifier.  Both to make it easy to generate, and to
>>> make it possible to transfer a private session from one device to another,
>>> we took the easy route of suggesting that UUIDs are a suitable
>>> identification tool.
>>> 
>>> Here is the original suggestion I sent.  Note that the server is being
>>> asked to segregate state, not to stop keeping state. This is about the
>>> aspect of privacy which is respecting the right context to ’say’ something:
>>> ‘why did you say that?’ not ‘why did you know/remember that?’. One of the
>>> problems with today’s net is not only that servers see and remember too much
>>> (not addressed here), but they have absolutely no sense of when it’s
>>> appropriate, or not, to reveal what they know (that is addressed here).
>>> 
>>> * * * * *
>>> 
>>> The user-agent can send an optional HTTP header ‘Persona:’ whose value is
>>> a suitable machine-generatable distinct identifier (e.g. a UUID). If the
>>> header is absent, the user is operating under their default (unlabeled)
>>> persona, which is distinct from all the identified personas, which in turn
>>> are also distinct from each other.  A user and their user-agent may return
>>> to a persona at any time, or continue using a persona for any length of
>>> time. A persona identifier is expected to be universally unique, not
>>> contextualized to the current user-agent or device.
>>> 
>>> Servers respecting this are requested to ensure that the labeled personas
>>> leave no trace or influence on each other or on the unlabeled persona.  For
>>> example, activity under one persona should not affect the ads shown under a
>>> different persona; any history records that the user can see should be
>>> distinct for each persona; and so on. (It’s OK for your unlabeled persona to
>>> be reflected in labeled ones, but optional; if servers wish, they can
>>> initialize a named persona from the default, un-named one, when they first
>>> see it.)
>>> 
>>> Server implementers may choose how long they retain records relating to
>>> separate personas, just as they do for today’s default persona.
>>> 
>>> This is NOT a request to stop tracking or keeping records; that is an
>>> orthogonal question that is covered by activities such as do-not-track,
>>> cookie directives, and so on. This is about giving users control of their
>>> privacy by controlling what gets linked to what, and exposed when.
>>> 
>>> It may be that it is not particularly necessary or valuable to have a
>>> machine-readable means of discovery over whether servers support this
>>> feature.  Any support that they provide is an improvement on today’s
>>> experience, where servers are unaware that users are trying to be private.
>>> Claims of support for this feature are probably better conveyed in
>>> advertising or other human-readable ways. On the other hand,
>>> machine-readable claims of support have two advantages: the browser can
>>> filter or warn about sites that don’t claim to respect it, and while not
>>> respecting it probably would not be actionable, claiming to and then not
>>> doing it would be lying to users, which might be.
>>> 
>>> This feature might also be valuable for shared terminals; for example, in
>>> libraries, airline lounges, internet cafes and the like, a new persona can
>>> be minted each time the terminal is unlocked for a new session.  Libraries
>>> might tie the persona to the library card, so users returning get re-linked
>>> to their online history and so on. It might also be a lightweight
>>> replacement of logging-in, for browsers on shared devices  — a browser might
>>> have a simple way of saying which family member it is right now (e.g. a
>>> pull-down menu).
>>> 
>>> * * * *
>>> 
>>> I think it’s interesting in a number of respects:
>>> 
>>> a) it’s an improvement on the status quo, where servers are completely
>>> unaware of any attempt to be private
>>> 
>>> b) it’s not asking for *secrecy* at all; servers are at liberty to
>>> remember as much as before; there are very few privacy proposals that don’t
>>> slide into trying to be secret, and this is one. Privacy is also about where
>>> information is exposed, what it is linked to, and so on.
>>> 
>>> c) it recognizes that privacy is not a binary state — it’s not an
>>> either-or (you have it or you don’t); it’s a spectrum, and it’s about
>>> perception and control and exposure as much as it is about recording and so
>>> on.
>>> 
>>> 
>>> * * * * * * *
>>> 
>>> What are some of the potential downsides?
>>> 
>>> 1) It doesn’t treat servers as adversaries, and if they are, in fact,
>>> ‘hostile’ might be giving them a clue ‘look here, someone is doing something
>>> under the covers’
>>> 
>>> 2) using a UUID for the persona has advantages — they are not
>>> contextualized by the ‘main’ persona that the server knows or guesses, and
>>> they can be shared across the user’s devices — but also provides a very
>>> explicit key ‘this is (this aspect of) me’, which again, for adversarial
>>> servers, might be an issue
>>> 
>>> 
>>> Note that there is no attempt to claim “this isn’t me, this is someone
>>> else” so linking personas is fine, if the server can work out they are the
>>> same person (e.g. by cookie or other means).
>>> 
>>> 
>>> David Singer
>>> Manager, Software Standards, Apple Inc.
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Joseph Lorenzo Hall
> Chief Technologist
> Center for Democracy & Technology
> 1634 I ST NW STE 1100
> Washington DC 20006-4011
> (p) 202-407-8825
> (f) 202-637-0968
> joe@cdt.org
> PGP: https://josephhall.org/gpg-key
> fingerprint: 3CA2 8D7B 9F6D DBD3 4B10  1607 5F86 6987 40A9 A871

David Singer
Manager, Software Standards, Apple Inc.
Received on Tuesday, 3 March 2015 21:29:14 UTC