Re: [saag] Liking Linkability

From: Henry Story <henry.story@bblfish.net> · Date: Tue, 9 Oct 2012 15:19:21 +0200

On 9 Oct 2012, at 14:29, "Klaas Wierenga (kwiereng)" <kwiereng@cisco.com> wrote:

> Hi Henry,
> 
> (adding saag, had not realised that it was a resend)
> 
> On Oct 9, 2012, at 12:05 AM, Henry Story <henry.story@bblfish.net> wrote:
> 
>> 
>> On 8 Oct 2012, at 20:27, "Klaas Wierenga (kwiereng)" <kwiereng@cisco.com> wrote:
>> 
>>> Hi Henry,
>>> 
>>> I think your definition of what constitutes a private conversation is a bit limited, especially in an electronic day and age. I consider the simple fact that we are having a conversation, without knowing what we talk about, a privacy sensitive thing. Do you want your wife to know that you are talking to your mistress, or your employer that you have a job interview?
>>> And do you believe that the location where you are does not constitute a privacy sensitive attribute?
>> 
>> Ok I think my definition still works: If someone knows that you are communicating with someone then they know something about the conversation. In my definition that does constitute a privacy violation at least for that bit of information.
> 
> ehm, I think that you need quite a bit of fantasy to read that in your definition ;-) So if you mean also "or are aware of the communication" you should perhaps include that, but, as you point out below, that does complicate things big time.

It was meant to be a working definition. 
For a much more detailed work on Privacy of course you would go to read

"Privacy in Context: Technology, Policy, and the Integrity of Social Life" by 
Helen Nissenbaum 
   http://www.sup.org/book.cgi?id=8862

The philoweb and the public-privacy groups should ( perhaps with saag ) work 
on building up a  reading list of philosophical, technical and legal books 
on the subject, with perhaps short  summaries that we technicians can read -
a great exit route also for those technicians who  get bored with coding - it 
can happen to the best!

Helen brings in the notion of context as a very important element in the 
understanding  of what privacy is. Privacy there does not mean secrecy, it 
means a lot more something akin to respecting the context in which information 
was given initially - eg banking details such as someone's home should not be 
divulged because the same information can be gotten from other sources, but 
should  remain protected because they were given as part of a context. (This 
was a supreme  court ruling according to Nissenbaum.)

So I think my working definition is good enough and should easily be 
extendable to be able to cover other cases like the one you mention. It does
not cover context for example but work by Oshani on usage restriction shows
one way one can go:
   http://dig.csail.mit.edu/2011/Papers/IEEE-Policy-httpa/paper.pdf

also remember I explained it as a minimal criteria we could agree on for 
privacy, not a full definition.

> 
>> Though I think you exaggerate what they know. Your wife won't know that you are talking to your mistress, just that you are talking to another server (If it is a freedom box, they could narrow it down to an individual). Information about it being a mistress cannot be found just by seeing information move over the wire. Neither does an employer know you have a job interview just because you are communicating with some server x from a different company. But he could be worried.
> 
> I think you are now digressing from the general case, whilst your definition was meant to be very generic (I believe?). I am not talking about implementations, but about the general principle. The fact that there is an xmpp session between klaas@cisco.com and compensation@apple.com may indicate to my manager that I am looking for another job.

yes, though if the session were peer to peer, and you were communicating in a
way that only the connection from one server to another could be deduced, then 
the information about the precise address would be hidden. I am not sure about
xmpp in this respect. But with WebID once the TLS connection is made the HTTP 
layer is encrypted, and so it should be impossible to see if you are doing a GET, 
PUT, POST or DELETE and even on which resource you were acting.

Still other things are visible... 

> My manager might also be worried if he sees me entering the Google premises, but that is much less likely (even though I have helped applicants get out of the building through the emergency exit because a colleague had arrived in the reception area in the past ;-) The reason I brought these examples up is that I believe something has changed with the ubiquity of online databases and online communication. When I didn't want to be overheard in the past I would go for a walk with someone and we could talk with reasonable assurance. Now I have to trust that say Skype is not listening in to my conversation and that Twitter will not hand my tweets to DHS. So the simple fact that I use an encrypted channel is not sufficient.

Of course. The important thing is that those not part of the  conversation 
not be able to gather information about the conversation. One can be more 
or less strict on the limits here - in some cases it will matter that even 
knowing who is communicating with whom be hidden, in other cases it may not 
be that important.

> 
>> 
>> So if I apply this to WebID ( http://webid.info/ ) - which is I think why you bring it up - WebID is currently based on TLS, which does make it possible to track connections between servers. But remember that the perfect is the enemy of the good. How come? Well, put things in context: by failing to create simple distributed systems which protect privacy of content pretty well, that works with current deployed technologies (e.g. browsers, and servers), we have allowed large social networks to grow to sizes unimaginable in any previous surveillance society.  So even a non optimal system like TLS can still bring huge benefits over the current status quo. If only in educating people in how to build such reasonably safe distributed systems.
> 
> I was not referring to WebID in particular. I applaud your effort, and do realise that perfect will not happen. However I think that your definition of privacy should either be scoped tightly to particular use cases or is too broad a brush. I tend to think that one single definition of privacy is not very useful, and rather like to think about different forms of privacy, location privacy, encrypted channels, plausible deniability etc.

yes. there are a lot of subcases. The point I was trying to make if we can get 
back to  the "Liking Linkability" argument of the thread (and not get lost 
in counting the number of angels on a privacy pin head) is that in order to 
create systems where  you can be as flexible as possible with whome you want
to share your resources with -ie. without placing yourself in a situation where
someone else is listening in - you need to allow for linkability of identity,
and resources as that is the only way to create a distributed social web.

As such worries about people being able to see that I am communicating with someone
in another company are laudable, but if put in perspective with the really big
issues of loss of privacy, is completely irrelevant for most use cases.

But as I say, those uses cases can be addressed with technologies such as Tor...

> 
>> 
>> But having put that in context, the issue of tracking what servers are communicating remains. There are technologies designed to make that opaque, such as Tor. I still need to prove that one can have .onion WebIDs, and that one can also connect with browsers using TLS behind Tor - but it should not be difficult to do. Once one can show this then it should be possible to develop protocols that make this a lot more efficient.  Would that convince you?
> 
> Ehm, what actually concerns me more is not the fact that *it is possible* to design
> proper protocols as much as that I would like to provide guidance to protocol developers
> to *prevent improper protocols*. Does that make sense?

yes, but don't make linkability an a priori bad thing, since it is the most important 
building block for creating distributed co-operative structures, and so to privacy.
That is the point of this thread. 

You may not be doing that btw, but if you look at Harry Halpin's arguments you'll
see a good example of how terminology of unlinkability as proposed in 

http://tools.ietf.org/html/draft-iab-privacy-terminology-01

can be misused. But to be fair it does say at the end of the document

[[
   Achieving anonymity, unlinkability, and undetectability may enable
   extreme data minimization.  Unfortunately, this would also prevent a
   certain class of useful two-way communication scenarios.  Therefore,
   for many applications, a certain amount of linkability and
   detectability is usually accepted while attempting to retain
   unlinkability between the data subject and his or her transactions.
   This is achieved through the use of appropriate kinds of pseudonymous
   identifiers.  These identifiers are then often used to refer to
   established state or are used for access control purposes
]]

Still in my conversations I have found that many people in security spaces 
just don't seem to be  able to put the issues in context, and can get sidetracked 
into not wanting any linkability at all. Not sure how to fix that.

> 
> Klaas
> 
>> 
>>> 
>>> Klaas
>>> 
>>> Sent from my iPad
>>> 
>>> On 8 okt. 2012, at 19:01, "Henry Story" <henry.story@bblfish.net> wrote:
>>> 
>>>> 
>>>> Notions of unlinkability of identities have recently been deployed 
>>>> in ways that I would like to argue, are often much too simplistic, 
>>>> and in fact harmful to wider issues of privacy on the web.
>>>> 
>>>> I would like to show this in two stages:
>>>> 1. That linkability of identity is essential to electronic privacy 
>>>> on the web
>>>> 2. Show an example of an argument by Harry Halpin relating to 
>>>> linkability, and by pulling it apart show how careful one has 
>>>> to be with taking such arguments at face value
>>>> 
>>>> Because privacy is the context in which the linkability or non linkability
>>>> of identities is important, I would like to start with a simple working 
>>>> definition of what constitutes privacy with the following minimal 
>>>> criterion [0] that I think everyone can agree on:
>>>> 
>>>> "A communication between two people is private if the only people 
>>>> who are party to the conversation are the two people in question. 
>>>> One can easily generalise to groups: a conversation between groups 
>>>> of people is private (to the group) if the only people who can 
>>>> participate/read the information are members of that group"
>>>> 
>>>> Note that this does not deal with issues of people who were privy to 
>>>> the conversation later leaking information voluntarily. We cannot 
>>>> technically legislate good behaviour, though we can make it possible 
>>>> for people to express context. [1]
>>>> 
>>>> 
>>>> 1. On the importance of linkability of identities to privacy 
>>>> ============================================================
>>>> 
>>>> A. Issues of Centralisation
>>>> ---------------------------
>>>> 
>>>> We can put this with the following thought experiment which I put
>>>> to Ben Laurie recently [0].
>>>> 
>>>> First imagine that we all are on one big social network, where 
>>>> all of our home pages are at the same URL. Nobody could link
>>>> to our profile page in any meaningful way. The bigger the network
>>>> the more different people that one URL could refer to. People 
>>>> that were part of the network could log in, and once logged in
>>>> communicate with others in their unlinkable channels. 
>>>> 
>>>> But this would not necessarily give users of the network privacy: 
>>>> simply because the network owner would be party to the conversation 
>>>> between any two people or any group of people. Conversations 
>>>> that do not wish the network owner to be party to the conversation
>>>> cannot work within that framework. 
>>>> 
>>>> At the level of our planet it is clear that there will always be a 
>>>> huge number of agents that cannot for legal or other reasons allow one 
>>>> global network owner to be party to all their conversations. We are 
>>>> therefore socio-logically forced into the social web.
>>>> 
>>>> B. Linkability and the Social Web
>>>> ---------------------------------
>>>> 
>>>> Secondly imagine that we now all have Freedom Boxes [4], where
>>>> each of us has full control over the box, its software, and the
>>>> data on it. (We take this extreme individualistic case to emphasise
>>>> the contrast, not because we don't acknowledge the importance of
>>>> many intermediate cases as useful) Now we want to create a 
>>>> distributed social network - the social web - where each of us can 
>>>> publish information and through access control rules limit who can 
>>>> access each resource. We would like to limit access to groups such
>>>> as:
>>>> 
>>>> - friends 
>>>> - friends of friends
>>>> - family
>>>> - business colleagues
>>>> - ... 
>>>> 
>>>> Limit access means, that we need to determine when accessing a 
>>>> resource who is accessing it. For this we need a global identifier
>>>> so that can check with the information available to us, if the 
>>>> referent of that identifier is indeed a member of one of those 
>>>> groups. We can't have a local identifier, for that would require
>>>> that the person we were dealing with had an account on our private
>>>> box - which will be extremely unlikely. We therefore need a way 
>>>> to identify - pseudonymously if be - agents in a global space.
>>>> 
>>>> Take the following example. Imagine you come to the WebID TPAC
>>>> meeting [6] and I take a picture of everyone present. I would like
>>>> to first restrict access to the picture to only those members who
>>>> were present. Clearly if I only used local identifiers, I would have
>>>> to get each one of you to first create an account on my machine. But 
>>>> how would I then know that the accounts created on the FBox correspond
>>>> to the people who were at the party? It is much easier if we could
>>>> create a party members group and publish it like this
>>>> 
>>>> http://www.w3.org/2005/Incubator/webid/team.n3
>>>> 
>>>> Then I could drag and drop this group on the access control panel
>>>> of my FBox admin console to restrict access to only those members.
>>>> This shows how through linkability I can restrict access and 
>>>> increase privacy by making it possible to link identities in a distributed
>>>> web. It would be quite possible furthermore for the above team.n3
>>>> resource to be protected by access control.
>>>> 
>>>> 
>>>> 2. Example of how Unlinkability can be used to spread FUD 
>>>> =========================================================
>>>> 
>>>> 
>>>> So here I would like to show how fears about linkability can
>>>> then bring intelligent people like Harry Halpin to make some seemingly
>>>> plausible arguments. Here is an example [2] of Harry arguing against
>>>> W3C WebID CG's http://webid.info/spec/ 
>>>> 
>>>> [[
>>>> Please look up "unlinkability" (which is why I kept referencing the 
>>>> aforementioned IETF doc [sic [3] below it is a draft] which I saw 
>>>> referenced earlier but whose main point seemed missed). Then explain 
>>>> how WebID provides unlinkability. 
>>>> 
>>>> Looking at the spec - to me, WebID doesn't as it still requires 
>>>> publishing your public key at a URI and then having the relying party go 
>>>> to your identity provider (i.e. your personal homepage in most cases, 
>>>> i.e. what it is that hosts your key) in order to verify your cert, which 
>>>> must provide that URI in the SAN in the cert. Thus,  WebID does not 
>>>> provide unlinkability. There's some waving of hands about guards and 
>>>> access control, but that would not mediate the above point, as the HTTP 
>>>> GET to the URI for the key is enough to provide the "link".
>>>> 
>>>> In comparison, BrowserID provides better privacy in terms of 
>>>> unlinkability by having the browser in between the identity provider and 
>>>> the relying party, so the relying party doesn't have to ping the 
>>>> identity provider for identity-related transactions. That definitely 
>>>> helps provide unlinkability in terms of the identity provider not 
>>>> needing to knowing every time the user goes to a relying party.
>>>> ]]
>>>> 
>>>> If I can rephrase the point seems to be the following: A WebID verification 
>>>> requires that the site your are authenticating to ( The Relying Party ) verify
>>>> your identity by dereferencing ( let me add: anonymously ) your profile 
>>>> page, which might only contain as much as your public key publicly. The yellow 
>>>> box in the picture here:
>>>> 
>>>> http://www.w3.org/2005/Incubator/webid/spec/#the-webid-protocol
>>>> 
>>>> The leakage of information then would not be towards the Relying Party - the
>>>> site you are logging into - because that site is the one you just wilfully 
>>>> sent a proof of your identity to. The leakage of information is (drum roll) 
>>>> towards your profile page server! That server might discover ( through IP address 
>>>> sniffing  presumably ) which sites you might be visiting. 
>>>> 
>>>> One reasonable answer to this problem would be for the Relying Party to fetch 
>>>> this information via Tor which would remove the ip address sniffing problem.
>>>> 
>>>> But let us develop the picture of who we are loosing (potentially) 
>>>> information to. There are a number of profile server scenarios: 
>>>> 
>>>> A. Profile on My Freedom Box [4]
>>>> 
>>>> The FreedomBox is a personal machine that I control, running
>>>> free software that I can inspect. Here the only person who has
>>>> access to the Freedom Box is me. So if I discover that I logged
>>>> in somewhere that should come as no surprise to me. I might even
>>>> be interested in this information as a way of gathering information
>>>> about where I logged in - and perhaps also if anything had been 
>>>> logging in somewhere AS me. (Sadly it looks like it might be
>>>> difficult to get much good information there as things stand 
>>>> currently with WebID.)
>>>> 
>>>> B. Profile on My Company/University Profile Server
>>>> 
>>>> As a member of a company, I am part of a larger agency, namely the 
>>>> Company or University who is backing my identity as member of that
>>>> institution. A profile on a University web site can mean a lot more
>>>> than a profile on some social network, because it is in part backed
>>>> by that institution. Of course as a member of that institution we
>>>> are part of a larger agent hood. And so it is not clear that the institution
>>>> and me are in that context that different. This is also why it is 
>>>> often legally required that one not use one's company identity for
>>>> private business.
>>>> 
>>>> C. A Social Network ( Google+, Facebook, ... )
>>>> 
>>>> It is a bit odd that people who are part of these networks, and who
>>>> are "liking" pretty much everything on the web in a way that is clearly
>>>> visible and is encouraged by those networks to be visible to the 
>>>> network, would have an issue with those sites knowing-perhaps (if the 
>>>> RP does not use Tor or a proxy) where they are logging into. It is certainly
>>>> not the way the OAuth, OpenID or other protocols that are in extremely 
>>>> wide use now have been developed and are used by those sites.
>>>> 
>>>> If we look then at BrowserId [7] Now Mozilla Persona, the only difference 
>>>> really with WebID ( apart from it not being decentralised until crypto in the
>>>> browser really works ) is that the certificate is updated at short notice 
>>>> - once a day - and that relying parties verify the signature. Neither of course
>>>> can the relying party get much interesting attributes this way, and if it did
>>>> then the whole of the unlinkability argument would collapse immediately.
>>>> 
>>>> 
>>>> 3. Conclusion
>>>> =============
>>>> 
>>>> Talking about privacy is like talking about security. It is a breeding ground 
>>>> for paranoia, which tend to make it difficult to notice important
>>>> solutions to the problem we actually have. Linkability or unlinkability as defined in
>>>> draft-hansen-privacy-terminology-03 [3] come with complicated definitions,
>>>> and are I suppose meant to be applied carefully. But the choice of "unlinkable"
>>>> as a word tends to help create rhethorical short cuts that are apt to hide the 
>>>> real problems of privacy. By trying too hard to make things unlinkable we are moving 
>>>> inevitably towards a centralised world where all data is in big brother's hands. 
>>>> 
>>>> I want to argue that we should all *Like* Linkability. We should
>>>> do it  aware that we can protect ourselves with access control (and TOR) 
>>>> and realise that we don't need to reveal anything more than anyone knew 
>>>> before hand in our linkable profiles.
>>>> 
>>>> To create a Social Web we need a Linkable ( and likeable ) social web.
>>>> We may need other technologies for running Wikileaks type set ups, but
>>>> the clearly cannot be the basic for an architecture of privacy - even
>>>> if it is an important element in the political landscape.
>>>> 
>>>> Henry
>>>> 
>>>> [0] this is from a discussion with Ben Laurie
>>>> http://lists.w3.org/Archives/Public/public-webid/2012Oct/att-0022/privacy-def-1.pdf
>>>> [1] Oshani's Usage Restriction paper 
>>>> http://dig.csail.mit.edu/2011/Papers/IEEE-Policy-httpa/paper.pdf
>>>> [2] http://lists.w3.org/Archives/Public/public-identity/2012Oct/0036.html
>>>> [3] https://tools.ietf.org/html/draft-hansen-privacy-terminology-03
>>>> [4] http://www.youtube.com/watch?v=SzW25QTVWsE
>>>> [6] http://www.w3.org/2012/10/TPAC/
>>>> [7] A Comparison between BrowserId and WebId
>>>> http://security.stackexchange.com/questions/5406/what-are-the-main-advantages-and-disadvantages-of-webid-compared-to-browserid
>>>> 
>>>> 
>>>> Social Web Architect
>>>> http://bblfish.net/
>>>> 
>>>> _______________________________________________
>>>> saag mailing list
>>>> saag@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/saag
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
> 

Social Web Architect
http://bblfish.net/

Attachments