Re: Privacy by Design in APIs

Hi all.

Jumping into the conversation ...

Thank you Robin and Hannes for this illuminating discussion.

It seems to me that it would be useful to have a robust open discussion in PING about what can or cannot not be done and what should or should not be done to offer greater privacy (or the option of greater privacy):

- in the design choices for JavaScript Web API specifications (i.e. at the development stage)
- in the deployment of JavaScript Web API specifications

and, not being a W3C technical expert - Is implementation considered as part of deployment or does it need to be considered separately?

Note: The DAP WG has already done some work in this area - see http://www.w3.org/TR/dap-privacy-reqs/ 

However, perhaps it would be worthwhile to first consider - What are the potential privacy risks or vulnerabilities and how they might arise? Fingerprinting (discussed by Robin and Hannes below) is one example. 

Obviously, the nature of the risks and vulnerabilities will vary in each case, but perhaps we can identify some common problems.

We can take this up in the call, but as not everyone can make the PING calls, perhaps we could start a couple of new email threads on these topics.

Christine


On Jul 19, 2012, at 10:08 AM, Hannes Tschofenig wrote:

> Hi Robin, 
> 
> 
> On Jun 6, 2012, at 7:06 PM, Robin Berjon wrote:
> 
>> Hi Hannes,
>> 
>> sorry for taking so long to get back to you, I have been gathering feedback from multiple sources and applying it whole rather than making a change for every individual input — it avoids making more changes than is necessary ;)
>> 
>> On Apr 4, 2012, at 09:32 , Hannes Tschofenig wrote:
>>> I re-read your document again. I had made remarks about the lack of terminology already and here are some more comments. 
>> 
>> Speaking of the terminology, at the time of your previous remarks the information I had was that the IAB's terminology document was in flux and therefore that reusing its terminology would probably incur more work than necessary.
>> 
>> Has this situation changed? I would very much like to refer to it and reuse the terminology as is since a common language is of high importance in this context.
> 
> We have now merged the terminology into the main document and I would consider it fairly stable. 
> Here is the pointer to the latest version: 
> http://tools.ietf.org/html/draft-iab-privacy-considerations-03
> 
>> 
>>> ** Target audience
>>> 
>>> You state that the target audience is  
>>> 
>>> 1) those people involved in the definition of JavaScript APIs, and 
>>> 2) those who implement such an API
>>> 
>>> (1) are the standards people but I am not sure whether the second group refers to the application developers who happen to use the API or whether it refers to the browser manufacturers who add the API to their browser implementation. 
>>> 
>>> I hope you are referring to the browser implementer and not to the application developers. 
>>> It might be useful to clarify this since the content of the document would be very different. 
>> 
>> My (admittedly only semi-native and rather globish) understanding of the meaning of "implementing an API" is that it covers providing the API but not using it (which is different from French, for instance).
>> 
>> Just to make sure that this is clear and triggers no faux-amis, I've made it clear that it's for implementation in user agents.
>> 
>> Developers would certainly need a different document; at some point there was some interest in producing that in DAP but so far one has stood up to take it on (to all those listening: *hint* *hint*).
> 
> I indeed make a differentiation between those who standardize, those who implement, and those who actually deploy full-blown services. The guidance that is offered to these different groups varies quite a lot. 
> 
> To give you an example from the IETF consider TLS. There are folks in the IETF TLS group who work on the specification. Then, there are guys who work on a TLS library (like OpenSSL). Not all of them who implement are active in standardization. Then, there are guys who develop a product and TLS is a tiny little part of the product but there are many other aspects. 
> 
> I am not saying that sometimes there aren't people who specify a TLS extension, implement that extension themselves, and deploy a complete protect. 
> 
> The guidance you will find with the OECD principles and many other of these privacy principles focuses on those who deploy. When you look at Directive 95/46/EC then you will notice that it talks about a data processor and a data controller. There is no software and equipment manufacturer in any of these texts.
> 
> That's why it is so difficult to take the Fair Information Practices, for example, and try to map them to the standardization work we do: they just don't match. They are written for a different audience that has different abilities to influence a product. 
> 
>> 
>>> ** Privacy and Security
>>> 
>>> You seem to wash away the differences between privacy and security.
>> 
>> Oh I don't seem to — I do; but *within the scope covered in the document*.
>> 
>>> That may sound useful but there are actually quite some differences. For security we have a fairly well understood terminology, threat modeling, and security services. See, for example, RFC 3552. Beyond the technical aspects most people understand security reasonable well in the meanwhile, and we also have processes in place to address security in the standards development process. 
>>> 
>>> The same cannot be said about privacy. In addition, the threats with privacy go beyond what security threats are concerned about. See http://tools.ietf.org/html/draft-iab-privacy-considerations-02#section-3. 
>> 
>> Yes, I am well aware of these and of the pertinence of this distinction in a number of contexts.
>> 
>>> For example, the issue of what happens with information when it gets transmitted to a server and is then re-distributed beyond the originally stated purpose is indeed a real problem.
>> 
>> It is certainly a very real problem. It just so happens that it is not a problem that concerns JS API design. The goal of this document, which I've reinforced in my recent edits, is really to improve the state of API design to bring it to a point where it is as privacy-supportive as possible. And there are definitely good things to be done there.
>> 
>>> So, in a nutshell I disagree with your statement that the difference between the two is "immaterial and irrelevant". 
>> 
>> I stand by the statement (again, within the scope of the document) but since it risks being invidious I've removed that section — we'll see if people re-raise the converse or not!
> 
> 
> I guess here we just disagree. 
> 
>> 
>>> ** Privacy by Design
>>> 
>>> There are various folks who had come up with the idea of privacy by design and they have some specific idea in mind. If you, for example, look at the work by Ann Cavoukian then you see that she has takes a certain perspective of what that means. Your list of privacy by design "requirements" or "principles" does not match to anything I have seen. Maybe you want to avoid the PbD marketing.  
>> 
>> I would love to avoid marketing, but since this is a term that has been used consistently in all discussions relating to API design over the past few years, I don't think that it would be helpful to artificially mint a new term. I think that it is much more important that people who are designing APIs — who form the core target for this document — read it as much as possible with terms that they know used in meanings that correspond to their experience.
>> 
>> As such I keep it — I don't think that using a community's dedicated terminology is "marketing" — but I've clarified the definition so that people who are not familiar with JS API design and therefore at risk of perceiving it as marketing can make the necessary adjustment.
>> 
> 
> I would prefer to just use a different term to avoid confusing with existing work. For example, call the section "Privacy Engineering". 
> 
>>> You write: "It is particularly important in privacy by design that users be exposed to as few direct privacy decisions as possible. Notably, it should never be assumed that users can be “educated” into making correct privacy decisions." While this sounds indeed nice all privacy principles I have seen talk about letting the user make informed decisions. So, there is certainly a balance between letting the user have control over what they do and not getting into their way. 
>> 
>> There are cases in which giving the user an informed choice is the best thing to do, and there are cases in which it is punting of the most cowardly variety. Where APIs are concerned, there is a strong undercurrent of temptation (though continuous outreach is helping diminish it) to fall into the latter trap.
>> 
>> One thing that is central here is taking into account experience with real users rather than a view of an idealised, XIXth century rational agent. More importantly, decisions that users make are actually better informed when taking place as part of an action than in a de-correlated dialog.
>> 
>> Beyond that I'm not sure what change your comment would like to see effected?
> 
> You could, for example, talk about the problem of finding the sweet-spot between presenting the users too many consent dialogs  and not asking them at all. This would be far better than coming to the conclusion to ask the user as little as possible unless you believe the privacy principles that others had set up are wrong.
> 
>> 
>>> "Poor Information Scoping": In some sense you are talking about "providing the user with enough information" and "letting them make fine grained access control decisions". The problem is that this is nothing the JavaScript API designer can decide about -- this is about the application designer writing his software and making a decision about what his or her application actually does and whether there are choices for the user. For example, someone writing Angry Birds could think about whether they really need access to location, phone state, sms, etc. for their application to work.
>> 
>> Actually, the API designer has a large role to play in making it possible for the user agent to do the right thing here. If your API always returns a lot of information whether the application requested all of it or just a subset, there is no way to surface that to the UI. Poor scoping makes it hard or impossible (or possible, but non-conforming) to support data minimisation.
> 
> OK. That makes sense. 
> 
>> 
>> Of course someone writing Angry Birds should think about whether they really need to know my friends' phone numbers. But that's a topic for another document which it would be great if someone were to step up to write (familiar readers may now recognise this as a *hint*).
>> 
>>> "User Fingerprinting": This section essentially says that there isn't really something one can do about fingerprinting (which is probably correct). Particularly when you consider the broader range of JavaScript APIs there is indeed a lot of information that is available (driven by the popularity of JavaScript itself).
>> 
>> No. There is definitely something that we can do about it. Some of what is exposed today can be progressively removed (disabling Java for instance can help), so we shouldn't give up and we should make sure that APIs produced today don't gratuitously leak even more information than is already available. Otherwise we'll never manage to get this under control.
> 
> 
> Disabling Java is not a guideline you would give to a JavaScript API developer. 
> 
> I still have to be convinced about the story on how to prevent fingerprinting at the level of a JavaScript API design. 
> 
>> 
>>> ** Design Strategies
>>> 
>>> In the three sub-sections you are essentially saying to expose as little information as possible. I don't think that is a reasonable strategy for the API standardization.
>> 
>> Can you explain why? It is hard to know what to change otherwise. I think that it makes a fair bit of sense for API standards to ensure that they are designed in such a manner as to provide as little information as is needed by the user's task.
>> 
> 
> The purpose of these JavaScript APIs is to share information. Only the designers of the final application can decide to request less. For example, in the Angry Birds case the game developer could decide not to ask for location of the user and therefore not serving location-based advertisements) but the geolocation API provides fine grained location. One could have designed the Geolocation API differently so that it only provides the granularity needed for a specific purpose but that would fall under information scoping described in an earlier section.  
> 
> So, all you could do is to have the design in such a way that it allows the server to provide an indication of the granularity of the information. 
> 
> The problem is then that you will have to ask the user. Based on my experience applications designers only offer a "take-it or leave-it" decision. It would, for example, a huge improvement to allow the user to have more choices since otherwise this additional API features will not help to provide more privacy protection in practice. But you are silent about those aspects. 
> 
> 
>>> The examples you provide are more driven by the community that had specific use cases in mind: At the time when mouse functionality was designed I doubt that anyone had thought about writing flight simulators in JavaScript. With more sophisticated games you may indeed want to have more information about what is attached and to make use of the available hardware features. 
>> 
>> I don't really understand what you are trying to say here. Are you saying that we should give up on a model we have that actually works and make it leak more device-identifying information? That seems rather counter-productive to me. The mouse model works, and gamepad uses its own variant that also works. These are sufficient to address the production of high-quality games, including flight simulators, without exposing the device to extensive fingerprinting.
> 
> My reading of the text is the following:
> 
> The design had been done in a specific way for whatever reason. Now, years later you claim that it was actually an intentional decision due to the great privacy experience of the participants in that group. 
> 
> 
>> I see very good reasons to further this pattern.
>> 
>>> To illustrate the point with the location API you could argue that it only needs to provide city-level granularity because this would be more privacy friendly. While this is true it wipes out a certain number of applications where more accurate location information is useful. For example, for the emergency services functionality it would be important to know how the location was obtained (e.g., manually entered, GPS, operator provided, etc.). 
>> 
>> Well, your example is actually a strong reinforcement of my point — do you mind if I include it? If the Geolocation API had been designed based on the patterns described in this document, it would have been trivial to make sure that it could return only city-level granularity when that's all that's needed, and pinpoint locations when those are desired — at user choice. Instead, the dialog  permission approach makes sure that the only viable option is to say "yes" and with no usable control over granularity.
> 
> Providing the recommendation to consider the use cases people have in mind is obviously a useful guidance. Everyone is doing that anyway. But calling these intentional privacy choices is a bit overselling. Don't you think so? 
> 
> An example: 
> 
> The location API does not provide a way to indicate how the location was actually obtained. The PIDF-LO gives this indication in the method element. I do suspect that those who participated in the location work have not seen this useful for their scenarios and that's why they haven't provided that feature. Now, I could celebrate this as a great privacy feature of the geolocation API that it did not disclose that information because it would allow additional fingerprinting. 
> 
> 
>> 
>> That's why I don't share your defeatism here. With proper patterns we could have improved Geolocation; we can still improve future APIs to come.
>> 
>>> The biggest challenge, I believe, is that the server-side that provides the JavaScript code also describes about what functionality it wants. The most important privacy decision has already been made when JavaScript-based mobile code distribution Web platform was designed. All that is then left to the browser implementer is to ask the user whether it grants access to the information or not.
>> 
>> That's where you're missing the point entirely. The Geolocation permission model is broken, and it is broken because the API is not designed to be privacy-conscious.
> 
> Which API do you consider good from a privacy point of view? From your text I understand the Gamepad API and the Vibration API. Anything else? 
> 
>> 
>>> Since you claim that no user education can be assumed and you also don't want to ask the user too often there isn't really a lot you can do to improve privacy from an API designer point of view. 
>> 
>> Au contraire — that's precisely what providing information only in the action and not in some permission dialog does. But for that you need APIs designed to support it.
>> 
>>> As a summary, I don't think any of your recommendations make a difference from a privacy point of view. For the browser implementer the only recommendation seems to be: "Don't act stupid: when someone wants to upload a file don't upload the entire filesystem.". Given the history of the UI for the security lock icon this is, of course, still useful. 
>> 
>> Given the extensive amount of discussion and work that is going for instance into building Web Intents in such a way that they support privacy and don't repeat the Geolocation mistakes, including with a lot of genuinely concerned help from browser vendors, I cannot say that I agree with your cynicism (or find it particularly constructive).
> 
> I do not follow the Web Intents work and what specific privacy design aspect guide the group. It would probably be a good example to hear about at one of the PLING calls. 
> 
>> 
>> But if you could kindly clarify, perhaps with a code example or IDL snippet, in which ways exactly this does not make a difference maybe I would be able to understand that you are right and give up entirely on privacy in APIs, or on the contrary to clarify the document so that we are on the same page.
>> 
>>> I am wondering whether you (or your TAG co-workers) have gone through a couple of Web technologies to determine how the design decisions influenced privacy properties.
>> 
>> This document is the distillation of many discussions covering notably APIs for Contacts, Vibration, Geolocation, Intents, Sensors, Gamepad, Camera access, and probably a few others that I'm forgetting. This is experience from the field, particularly from DAP and WebApps, not the product of some ivory tower discussion.
> 
> In essence your privacy story is: 
> 
> a) Information scoping (-> you call it minimization)
> b) Consent (-> you call it user mediation)
> c) Not presenting functionality the device supports (this is what you call graceful degradation and also the text in Action-Based Availability falls under that category)
> 
> (a) makes sense and is probably something that could be explored further. 
> 
> With (b) you provide mixed message in the document since you are saying that we shouldn't ask the user too often, which may give some readers the impression not to ask at all or very rarely. When I look at the current smart phone applications then you see an indication when you install the application (or update it) and that's it. You cannot for my phone at least see what this information is used for nor influence the selection (by selecting a sub-set of the asked information). There is also no way for me to see what information these applications have been shared with whom (a dashboard like thing), etc. etc. So, I fear that those who read the document will get the impression that this is the sort of standard we are working towards. 
> 
> (c) has in my view limited applicability given that there is a lot of value in knowing what a device can or cannot do. If I had the difference between a tablet PC, a smart phone, and a desktop PC from an API point of view my application will have difficulties to adjust the UI to the screen-size, for example. My view is that to avoid fingerprinting one really has to disable JavaScript.
> 
> 
>> 
>>> We did this exercise in the IAB and made a couple of interesting observations, such as
>>> * lacking a common terminology the investigated privacy threat was interpreted differently by various participants, and changed significantly over time
>> 
>> Sure — that's why I'd like to know when your terminology document is stable so that I can reuse it.
>> 
>>> * solutions were documented but there was typically no detailed description of the privacy threats, 
>>> * folks worked on selected solutions that in a bigger picture made no sense (particularly with regard to what was deployed).
>> 
>> Most of the problems I've noted have been:
>> 
>> • People aren't aware that fingerprinting is a concern, or don't think that it's possible to provide the same functionality while avoiding it.
>> • People think that the Geolocation permissions model is perfect because they haven't actually tried to look at an alternative.
>> • People think that the developer should be informed and fully in control of what's going on (e.g. told that there is no available vibrator on the device before calling vibrate()) when in fact that's rarely helpful and often harmful.
>> 
>> Once you get past that, there's progress though.
>> 
>>> PS: I had to laugh when I read "By default, the user agent should provide Web applications with an environment that is privacy-safe for the user in that it does not expose any of the user's information without her consent."
>> 
>> Well, we can give up or we can find the way of getting there. I'm glad that the API community is more interested in finding fixes than in feeling sarcastic about it.
>> 
> 
> Ciao
> Hannes
> 
>> -- 
>> Robin Berjon - http://berjon.com/ - @robinberjon
>> 
>> 
> 
> 

Received on Thursday, 19 July 2012 09:50:58 UTC