Re: Privacy by Design in APIs from Robin Berjon on 2012-06-06 (public-privacy@w3.org from April to June 2012)

From: Robin Berjon <robin@berjon.com>
Date: Wed, 6 Jun 2012 18:06:53 +0200
To: Hannes Tschofenig <Hannes.Tschofenig@gmx.net>
Cc: "public-privacy (W3C mailing list)" <public-privacy@w3.org>, Hannes Tschofenig <hannes.tschofenig@nsn.com>
Message-Id: <996D60C3-E177-4B58-9B11-5F0E8041DBC8@berjon.com>
Hi Hannes,

sorry for taking so long to get back to you, I have been gathering feedback from multiple sources and applying it whole rather than making a change for every individual input — it avoids making more changes than is necessary ;)

On Apr 4, 2012, at 09:32 , Hannes Tschofenig wrote:
> I re-read your document again. I had made remarks about the lack of terminology already and here are some more comments. 

Speaking of the terminology, at the time of your previous remarks the information I had was that the IAB's terminology document was in flux and therefore that reusing its terminology would probably incur more work than necessary.

Has this situation changed? I would very much like to refer to it and reuse the terminology as is since a common language is of high importance in this context.

> ** Target audience
> 
> You state that the target audience is  
> 
> 1) those people involved in the definition of JavaScript APIs, and 
> 2) those who implement such an API
> 
> (1) are the standards people but I am not sure whether the second group refers to the application developers who happen to use the API or whether it refers to the browser manufacturers who add the API to their browser implementation. 
> 
> I hope you are referring to the browser implementer and not to the application developers. 
> It might be useful to clarify this since the content of the document would be very different. 

My (admittedly only semi-native and rather globish) understanding of the meaning of "implementing an API" is that it covers providing the API but not using it (which is different from French, for instance).

Just to make sure that this is clear and triggers no faux-amis, I've made it clear that it's for implementation in user agents.

Developers would certainly need a different document; at some point there was some interest in producing that in DAP but so far one has stood up to take it on (to all those listening: *hint* *hint*).

> ** Privacy and Security
> 
> You seem to wash away the differences between privacy and security.

Oh I don't seem to — I do; but *within the scope covered in the document*.

> That may sound useful but there are actually quite some differences. For security we have a fairly well understood terminology, threat modeling, and security services. See, for example, RFC 3552. Beyond the technical aspects most people understand security reasonable well in the meanwhile, and we also have processes in place to address security in the standards development process. 
> 
> The same cannot be said about privacy. In addition, the threats with privacy go beyond what security threats are concerned about. See http://tools.ietf.org/html/draft-iab-privacy-considerations-02#section-3. 

Yes, I am well aware of these and of the pertinence of this distinction in a number of contexts.

> For example, the issue of what happens with information when it gets transmitted to a server and is then re-distributed beyond the originally stated purpose is indeed a real problem.

It is certainly a very real problem. It just so happens that it is not a problem that concerns JS API design. The goal of this document, which I've reinforced in my recent edits, is really to improve the state of API design to bring it to a point where it is as privacy-supportive as possible. And there are definitely good things to be done there.

> So, in a nutshell I disagree with your statement that the difference between the two is "immaterial and irrelevant". 

I stand by the statement (again, within the scope of the document) but since it risks being invidious I've removed that section — we'll see if people re-raise the converse or not!

> ** Privacy by Design
> 
> There are various folks who had come up with the idea of privacy by design and they have some specific idea in mind. If you, for example, look at the work by Ann Cavoukian then you see that she has takes a certain perspective of what that means. Your list of privacy by design "requirements" or "principles" does not match to anything I have seen. Maybe you want to avoid the PbD marketing.  

I would love to avoid marketing, but since this is a term that has been used consistently in all discussions relating to API design over the past few years, I don't think that it would be helpful to artificially mint a new term. I think that it is much more important that people who are designing APIs — who form the core target for this document — read it as much as possible with terms that they know used in meanings that correspond to their experience.

As such I keep it — I don't think that using a community's dedicated terminology is "marketing" — but I've clarified the definition so that people who are not familiar with JS API design and therefore at risk of perceiving it as marketing can make the necessary adjustment.

> You write: "It is particularly important in privacy by design that users be exposed to as few direct privacy decisions as possible. Notably, it should never be assumed that users can be “educated” into making correct privacy decisions." While this sounds indeed nice all privacy principles I have seen talk about letting the user make informed decisions. So, there is certainly a balance between letting the user have control over what they do and not getting into their way. 

There are cases in which giving the user an informed choice is the best thing to do, and there are cases in which it is punting of the most cowardly variety. Where APIs are concerned, there is a strong undercurrent of temptation (though continuous outreach is helping diminish it) to fall into the latter trap.

One thing that is central here is taking into account experience with real users rather than a view of an idealised, XIXth century rational agent. More importantly, decisions that users make are actually better informed when taking place as part of an action than in a de-correlated dialog.

Beyond that I'm not sure what change your comment would like to see effected?

> "Poor Information Scoping": In some sense you are talking about "providing the user with enough information" and "letting them make fine grained access control decisions". The problem is that this is nothing the JavaScript API designer can decide about -- this is about the application designer writing his software and making a decision about what his or her application actually does and whether there are choices for the user. For example, someone writing Angry Birds could think about whether they really need access to location, phone state, sms, etc. for their application to work.

Actually, the API designer has a large role to play in making it possible for the user agent to do the right thing here. If your API always returns a lot of information whether the application requested all of it or just a subset, there is no way to surface that to the UI. Poor scoping makes it hard or impossible (or possible, but non-conforming) to support data minimisation.

Of course someone writing Angry Birds should think about whether they really need to know my friends' phone numbers. But that's a topic for another document which it would be great if someone were to step up to write (familiar readers may now recognise this as a *hint*).

> "User Fingerprinting": This section essentially says that there isn't really something one can do about fingerprinting (which is probably correct). Particularly when you consider the broader range of JavaScript APIs there is indeed a lot of information that is available (driven by the popularity of JavaScript itself).

No. There is definitely something that we can do about it. Some of what is exposed today can be progressively removed (disabling Java for instance can help), so we shouldn't give up and we should make sure that APIs produced today don't gratuitously leak even more information than is already available. Otherwise we'll never manage to get this under control.

> ** Design Strategies
> 
> In the three sub-sections you are essentially saying to expose as little information as possible. I don't think that is a reasonable strategy for the API standardization.

Can you explain why? It is hard to know what to change otherwise. I think that it makes a fair bit of sense for API standards to ensure that they are designed in such a manner as to provide as little information as is needed by the user's task.

> The examples you provide are more driven by the community that had specific use cases in mind: At the time when mouse functionality was designed I doubt that anyone had thought about writing flight simulators in JavaScript. With more sophisticated games you may indeed want to have more information about what is attached and to make use of the available hardware features. 

I don't really understand what you are trying to say here. Are you saying that we should give up on a model we have that actually works and make it leak more device-identifying information? That seems rather counter-productive to me. The mouse model works, and gamepad uses its own variant that also works. These are sufficient to address the production of high-quality games, including flight simulators, without exposing the device to extensive fingerprinting.

I see very good reasons to further this pattern.

> To illustrate the point with the location API you could argue that it only needs to provide city-level granularity because this would be more privacy friendly. While this is true it wipes out a certain number of applications where more accurate location information is useful. For example, for the emergency services functionality it would be important to know how the location was obtained (e.g., manually entered, GPS, operator provided, etc.). 

Well, your example is actually a strong reinforcement of my point — do you mind if I include it? If the Geolocation API had been designed based on the patterns described in this document, it would have been trivial to make sure that it could return only city-level granularity when that's all that's needed, and pinpoint locations when those are desired — at user choice. Instead, the dialog  permission approach makes sure that the only viable option is to say "yes" and with no usable control over granularity.

That's why I don't share your defeatism here. With proper patterns we could have improved Geolocation; we can still improve future APIs to come.

> The biggest challenge, I believe, is that the server-side that provides the JavaScript code also describes about what functionality it wants. The most important privacy decision has already been made when JavaScript-based mobile code distribution Web platform was designed. All that is then left to the browser implementer is to ask the user whether it grants access to the information or not.

That's where you're missing the point entirely. The Geolocation permission model is broken, and it is broken because the API is not designed to be privacy-conscious.

> Since you claim that no user education can be assumed and you also don't want to ask the user too often there isn't really a lot you can do to improve privacy from an API designer point of view. 

Au contraire — that's precisely what providing information only in the action and not in some permission dialog does. But for that you need APIs designed to support it.

> As a summary, I don't think any of your recommendations make a difference from a privacy point of view. For the browser implementer the only recommendation seems to be: "Don't act stupid: when someone wants to upload a file don't upload the entire filesystem.". Given the history of the UI for the security lock icon this is, of course, still useful. 

Given the extensive amount of discussion and work that is going for instance into building Web Intents in such a way that they support privacy and don't repeat the Geolocation mistakes, including with a lot of genuinely concerned help from browser vendors, I cannot say that I agree with your cynicism (or find it particularly constructive).

But if you could kindly clarify, perhaps with a code example or IDL snippet, in which ways exactly this does not make a difference maybe I would be able to understand that you are right and give up entirely on privacy in APIs, or on the contrary to clarify the document so that we are on the same page.

> I am wondering whether you (or your TAG co-workers) have gone through a couple of Web technologies to determine how the design decisions influenced privacy properties.

This document is the distillation of many discussions covering notably APIs for Contacts, Vibration, Geolocation, Intents, Sensors, Gamepad, Camera access, and probably a few others that I'm forgetting. This is experience from the field, particularly from DAP and WebApps, not the product of some ivory tower discussion.

> We did this exercise in the IAB and made a couple of interesting observations, such as
>  * lacking a common terminology the investigated privacy threat was interpreted differently by various participants, and changed significantly over time

Sure — that's why I'd like to know when your terminology document is stable so that I can reuse it.

>  * solutions were documented but there was typically no detailed description of the privacy threats, 
>  * folks worked on selected solutions that in a bigger picture made no sense (particularly with regard to what was deployed).

Most of the problems I've noted have been:

• People aren't aware that fingerprinting is a concern, or don't think that it's possible to provide the same functionality while avoiding it.
• People think that the Geolocation permissions model is perfect because they haven't actually tried to look at an alternative.
• People think that the developer should be informed and fully in control of what's going on (e.g. told that there is no available vibrator on the device before calling vibrate()) when in fact that's rarely helpful and often harmful.

Once you get past that, there's progress though.

> PS: I had to laugh when I read "By default, the user agent should provide Web applications with an environment that is privacy-safe for the user in that it does not expose any of the user's information without her consent."

Well, we can give up or we can find the way of getting there. I'm glad that the API community is more interested in finding fixes than in feeling sarcastic about it.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Wednesday, 6 June 2012 16:07:20 UTC