Fwd: Results from Privacy review of Presentations API using Privacy Questionaire. (Wall of text warning!)

Colleagues,

Greg and Joe have done some excellent work applying the work-in-progress privacy questionnaire to the Presentation API (discussed on the August call). This is a call for some volunteers to take a look at their work and the Presentation API so that we could complete the privacy review before the Second Screen WG meets at TPAC.

We will be having a call in October prior to TPAC, probably on 22 October. This will be a good time to wrap up our review, so any feedback should be shared on this list in the next two weeks.

Christine and Tara

Begin forwarded message:

From: Greg Norcie <gnorcie@cdt.org<mailto:gnorcie@cdt.org>>
Subject: Re: Results from Privacy review of Presentations API using Privacy Questionaire. (Wall of text warning!)
Date: 27 August 2015 8:36:40 pm GMT+2
To: Joseph Lorenzo Hall <joe@cdt.org<mailto:joe@cdt.org>>
Cc: "public-privacy (W3C mailing list)" <public-privacy@w3.org<mailto:public-privacy@w3.org>>
Resent-From: <public-privacy@w3.org<mailto:public-privacy@w3.org>>
Reply-To: norcie@cdt.org<mailto:norcie@cdt.org>

For the location question, I was trying to distinguish between an exact location (ex: GPS coordinates) and more general types of sharing. (Ex: checking into something on Foursquare)

Also, I edited the question in a way that I hope captures both sides of the pond: there's some mention of specific data types now, then a question about personally derived data. That way we can throw up a red flag if certain things are brought up (ex: SSN (or whatever the rest of the world's equivalent might be), but we're not saying only these things are private.

On Thu, Aug 27, 2015 at 10:59 AM, Joseph Lorenzo Hall <joe@cdt.org<mailto:joe@cdt.org>> wrote:
This is great, Greg... some comments inline. I hope others have had a chance to take a look at the questionnaire and examining a spec with the questions in had seems to be very useful.

On Thu, Aug 20, 2015 at 3:38 PM, Greg Norcie <gnorcie@cdt.org<mailto:gnorcie@cdt.org>> wrote:
Hi all,

I reviewed the Presentation API<http://www.w3.org/TR/presentation-api/> using the Privacy Questionnaire, results are below, followed my some discussion of what was/was not captured.

Before I begin I think we should all pause and give some credit to the folks working on this standard. I think they're doing a great job working to minimize any privacy impacts that might be present.

I used the most recent version of the questionnaire available on the wiki when I started (hardlink <https://www.w3.org/wiki/index.php?title=Privacy_and_security_questionnaire&oldid=85382> for future reference):

  1.  Does this specification have a "Privacy Considerations" section?

Does it? Sounds like from below it has a "security and privacy" section but not a stand-alone privacy section.

  1.  Does this specification collect personally derived data?
     *   Not directly, however any audio/video will contain inherently privacy data
  2.  Does this specification generate personally derived data, and if so how will that data be handled?
     *   Yes, this specification can collect audio/video data. Also, this spec can (in it's currently

Hmm, seems like some text was cut off here.

     *
        *   No, the standard bundles security and privacy into one section.
        *   (Though it should be noted they couldn't be expected to since the privacy questionnaire is in beta :) )
        *   Not directly, but audio/video could be used to derive a location.
     *   How should this specification work in the context of a user agent’s "incognito" mode?
        *   The spec should clear all permissions after an incognito, with no traces the mode was used on the machine.
        *   While in operation, a tab that is "incognito" should be considered a separate instance from any instances in the non-incognito tabs.
     *   Is it possible to spoof/fake the data being generated for privacy purposes?
        *   Presumably but onus is on consumer to use software to set up a virtual device.
        *   (IMHO this is acceptable, as long as the spec specifies it should not actively deny users the option to send video data to a virtual device... maybe this sentiment should be explicitly mentioned in the question?)
     *   Does the standard utilize data that is personally-derived, i.e. derived from the interaction of a single person, or their device or address? If the data could be re-correlated, does the data record contain elements that would explicitly enable such re-correlation such as unique identifiers?
        *   Yes, but aside from the usual caveats about facial recognition recorrelation does not appear to be an issue.
     *   Does the data record contain elements that would enable re-correlation when combined with other datasets through the property of intersection?
        *   No (just audio/video)

This seems like a hard question... on the one hand, if a "face" is enough from which to derive a facial pattern that you can correlate with other databases of facial patterns, then the answer would seem to be yes (although I don't know of any public databases of facial biometrics). Maybe there's a better way to get at what this question wants to get at? Does anyone remember what the impetus for this question is? or can we think of examples in a spec that we'd definitely want to catch with this question?

        *
     *   Is the user likely to know if information is being collected?
        *   Yes, the user will have to interact with their computer in order to enable the presentation display.
     *   Can the user easily, preferably through an element of the GUI, revoke consent granted to a particular feature?
        *   Not necessarily - as I understand it there is not currently a GUI element to revoke consent to the presentation API once granted
  1.  Does this specification allow an origin access to a user’s location, and if so is that information minimized?

Sounds like this last one is "no"?

Overall, I think the questionnaire is moving forward - with some language tweaks and additions I feel like we will be 80% there.

but there's still some major issues... so based on my reading I plan to made several changes... I'm sharing them here rather than just diving into the wiki and editing without any chance for people to give feedback before they go into the wiki.


  *   I'd like to remove the security section since Mike West's questions<https://w3ctag.github.io/security-questionnaire/> cover that aspect nicely, and I think forcing people to do a separate, explicit privacy review is extremely desirable.
     *   (Too often people do a security review, assume that security is a subset of privacy, and then consider their spec review finished)
     *   We can discuss maybe merging the two in the future, but for now I think they should stay separate.
  *   I plan to edit the text a bit so it's more formal... this is my own fault since I wrote a large chunk of this. I know it is a draft but I feel I was way too conversational when reading several questions.
  *   I also plan to edit the wiki formatting so we can link to individual questions, this will make it easier to discuss the questions IMHO
  *   For question 1 ("Does this specification have a "Privacy Considerations" section?")  we should make it clearer that the "privacy considerations section" must be on it's own (not a "privacy and security considerations" section where someone can list off their encryption techniques and avoid critical examination of privacy impacts)
  *   For question 2 ("Does this specification collect personally derived data?")  we should clarify this refers to what in the USA would be "PII" - adresses, SSN/national ID #, ZIP/postal code, etc. Conversely, question 3 will inquire about data collected from a user via sensors that may be sensitive (audio, video, telemetry data, etc)

Wondering what non-US folks think of this... we can probably make it pretty universal by talking about personal data a la the EU.

  *
  *    For question 4 ("Does this specification allow an origin access to a user’s location, and if so is that information minimized?") mention _direct_ access to distinguish

What did you want to get at here, Greg?

  *
  *   For question 5 ("How should this specification work in the context of a user agent’s "incognito" mode?") we may also want to address the issue of local security vs network security in the explanation, or split into two separate questions
  *   For question 6 ("Is it possible to spoof/fake the data being generated for privacy purposes?") we should make it clearer a specification should merely respect virtual devices/streams/other sources (which may be spoofed) rather than explicity creating this functionality in their specification
  *   For question 7 ("Does the standard utilize data that is personally-derived, i.e. derived from the interaction of a single person, or their device or address?") it should be clarified this is referring to the traditional definition of PII, and not intended to reflect personal info such as a photo of the user.
  *   For question 8 ("Does the data record contain elements that would enable re-correlation when combined with other datasets through the property of intersection?") we should rewrite it to clarify this is meant to mean fingerprinting. (Property of intersection is unnecessarily academic IMHO)
  *   None of these questions addresses the threat of pervasive surveillance (see RFC<https://tools.ietf.org/html/rfc7258> 7258<https://tools.ietf.org/html/rfc7258>). I propose adding a question "Does this standard protect the user against pervasive surveillance through the use of encryption (when possible)". Explanatory text can elaborate that we are referring to technologies like TLS

I like this; wondering what others think.


  *   Question 10 ("Can the user easily, preferably through an element of the GUI, revoke consent granted to a particular feature?") and 11 ("Once consent has been given, is there a mechanism whereby it can be automatically revoked after a reasonable, or user configurable, period?") are redundant, so instead we should edit them so 10 deals with granting permission, and 11 deals with revoking it.
  *   Just noticed #2 needs an explanation, and could probably use a quick pass for grammar (my fault since I wrote it :) )

Finally, there is one question that I'm not sure how the current questionnaire can address: How do we handle the fact that often data is only transported by a standard - how that data is used afterwards is hard to embed into spec?


I think this is out of scope... unless we can think of a way to get this in (would it be to recommend spec authors put language in their specs that talk about the risks of storing data when marshaled out of the UA?). best, Joe

--
/***********************************/
Greg Norcie (norcie@cdt.org<mailto:norcie@cdt.org>)
Staff Technologist
Center for Democracy & Technology
1634 Eye St NW Suite 1100
Washington DC 20006
(p) 202-637-9800<tel:202-637-9800>
PGP: http://norcie.com/pgp.txt


Fingerprint:
73DF-6710-520F-83FE-03B5
8407-2D0E-ABC3-E1AE-21F1

/***********************************/



--
Joseph Lorenzo Hall
Chief Technologist
Center for Democracy & Technology
1634 I ST NW STE 1100
Washington DC 20006-4011
(p) 202-407-8825<tel:202-407-8825>
(f) 202-637-0968<tel:202-637-0968>
joe@cdt.org<mailto:joe@cdt.org>
PGP: https://josephhall.org/gpg-key

fingerprint: 3CA2 8D7B 9F6D DBD3 4B10  1607 5F86 6987 40A9 A871





--
/***********************************/
Greg Norcie (norcie@cdt.org<mailto:norcie@cdt.org>)
Staff Technologist
Center for Democracy & Technology
1634 Eye St NW Suite 1100
Washington DC 20006
(p) 202-637-9800
PGP: http://norcie.com/pgp.txt


Fingerprint:
73DF-6710-520F-83FE-03B5
8407-2D0E-ABC3-E1AE-21F1

/***********************************/

Received on Friday, 18 September 2015 09:12:28 UTC