Results from Privacy review of Presentations API using Privacy Questionaire. (Wall of text warning!) from Greg Norcie on 2015-08-20 (public-privacy@w3.org from July to September 2015)

From: Greg Norcie <gnorcie@cdt.org>
Date: Thu, 20 Aug 2015 15:38:33 -0400
To: "public-privacy (W3C mailing list)" <public-privacy@w3.org>
Cc: Joe Hall <joe@cdt.org>
Message-ID: <CAMJgV7Z=tCbMJC1d3FAP9rtQjxZCv_yE3Grw5Q3VNi+J4VtYOA@mail.gmail.com>
Hi all,

I reviewed the Presentation API <http://www.w3.org/TR/presentation-api/>
using the Privacy Questionnaire, results are below, followed my some
discussion of what was/was not captured.

Before I begin I think we should all pause and give some credit to the
folks working on this standard. I think they're doing a great job working
to minimize any privacy impacts that might be present.

I used the most recent version of the questionnaire available on the wiki
when I started (hardlink
<https://www.w3.org/wiki/index.php?title=Privacy_and_security_questionnaire&oldid=85382>for
future reference):

   1. Does this specification have a "Privacy Considerations" section?
   2. Does this specification collect personally derived data?
      - Not directly, however any audio/video will contain inherently
      privacy data
      3. Does this specification generate personally derived data, and if
   so how will that data be handled?
      - Yes, this specification can collect audio/video data. Also, this
      spec can (in it's currently
      - No, the standard bundles security and privacy into one section.
         - (Though it should be noted they couldn't be expected to since
         the privacy questionnaire is in beta :) )
         - Not directly, but audio/video could be used to derive a
         location.
         - How should this specification work in the context of a user
      agent’s "incognito" mode?
         - The spec should clear all permissions after an incognito, with
         no traces the mode was used on the machine.
         - While in operation, a tab that is "incognito" should be
         considered a separate instance from any instances in the
non-incognito tabs.
         - Is it possible to spoof/fake the data being generated for
      privacy purposes?
         - Presumably but onus is on consumer to use software to set up a
         virtual device.
         - (IMHO this is acceptable, as long as the spec specifies it
         should not actively deny users the option to send video data
to a virtual
         device... maybe this sentiment should be explicitly mentioned in the
         question?)
         - Does the standard utilize data that is personally-derived, i.e.
      derived from the interaction of a single person, or their device or
      address? If the data could be re-correlated, does the data record contain
      elements that would explicitly enable such re-correlation such as unique
      identifiers?
         - Yes, but aside from the usual caveats about facial recognition
         recorrelation does not appear to be an issue.
         - Does the data record contain elements that would enable
      re-correlation when combined with other datasets through the property of
      intersection?
         - No (just audio/video)
         - Is the user likely to know if information is being collected?
         - Yes, the user will have to interact with their computer in order
         to enable the presentation display.
         - Can the user easily, preferably through an element of the GUI,
      revoke consent granted to a particular feature?
         - Not necessarily - as I understand it there is not currently a
         GUI element to revoke consent to the presentation API once granted
         4. Does this specification allow an origin access to a user’s
   location, and if so is that information minimized?

Overall, I think the questionnaire is moving forward - with some language
tweaks and additions I feel like we will be 80% there.

but there's still some major issues... so based on my reading I plan to
made several changes... I'm sharing them here rather than just diving into
the wiki and editing without any chance for people to give feedback before
they go into the wiki.


   - I'd like to remove the security section since Mike West's questions
   <https://w3ctag.github.io/security-questionnaire/> cover that aspect
   nicely, and I think forcing people to do a separate, explicit privacy
   review is extremely desirable.
   - (Too often people do a security review, assume that security is a
      subset of privacy, and then consider their spec review finished)
      - We can discuss maybe merging the two in the future, but for now I
      think they should stay separate.
      - I plan to edit the text a bit so it's more formal... this is my own
   fault since I wrote a large chunk of this. I know it is a draft but I feel
   I was way too conversational when reading several questions.
   - I also plan to edit the wiki formatting so we can link to individual
   questions, this will make it easier to discuss the questions IMHO
   - For question 1 ("*Does this specification have a "Privacy
   Considerations" section?*")  we should make it clearer that the "privacy
   considerations section" must be on it's own (not a "privacy and security
   considerations" section where someone can list off their encryption
   techniques and avoid critical examination of privacy impacts)
   - For question 2 ("*Does this specification collect personally derived
   data?*")  we should clarify this refers to what in the USA would be
   "PII" - adresses, SSN/national ID #, ZIP/postal code, etc. Conversely,
   question 3 will inquire about data collected from a user via *sensors*
   that may be sensitive (audio, video, telemetry data, etc)
   -  For question 4 ("*Does this specification allow an origin access to a
   user’s location, and if so is that information minimized?*") mention
   _direct_ access to distinguish
   - For question 5 ("*How should this specification work in the context of
   a user agent’s "incognito" mode?*") we may also want to address the
   issue of local security vs network security in the explanation, or split
   into two separate questions
   - For question 6 ("*Is it possible to spoof/fake the data being
   generated for privacy purposes?*") we should make it clearer a
   specification should merely respect virtual devices/streams/other sources
   (which may be spoofed) rather than explicity creating this functionality in
   their specification
   - For question 7 ("*Does the standard utilize data that is
   personally-derived, i.e. derived from the interaction of a single person,
   or their device or address?*") it should be clarified this is referring
   to the traditional definition of PII, and not intended to reflect personal
   info such as a photo of the user.
   - For question 8 ("*Does the data record contain elements that would
   enable re-correlation when combined with other datasets through the
   property of intersection?*") we should rewrite it to clarify this is
   meant to mean fingerprinting. (Property of intersection is unnecessarily
   academic IMHO)
   - None of these questions addresses the threat of pervasive surveillance
   (see RFC <https://tools.ietf.org/html/rfc7258> 7258
   <https://tools.ietf.org/html/rfc7258>). I propose adding a question
   "Does this standard protect the user against pervasive surveillance through
   the use of encryption (when possible)". Explanatory text can elaborate that
   we are referring to technologies like TLS
   - Question 10 (*"**Can the user easily, preferably through an element of
   the GUI, revoke consent granted to a particular feature?"*) and 11 (*"**Once
   consent has been given, is there a mechanism whereby it can be
   automatically revoked after a reasonable, or user configurable, period?"*)
   are redundant, so instead we should edit them so 10 deals with granting
   permission, and 11 deals with revoking it.
   - Just noticed #2 needs an explanation, and could probably use a quick
   pass for grammar (my fault since I wrote it :) )

Finally, there is one question that I'm not sure how the current
questionnaire can address: How do we handle the fact that often data is
only transported by a standard - how that data is used afterwards is hard
to embed into spec?

-- 
/***********************************/

*Greg Norcie (norcie@cdt.org <norcie@cdt.org>)*

*Staff Technologist*
*Center for Democracy & Technology*
1634 Eye St NW Suite 1100
Washington DC 20006
(p) 202-637-9800
PGP: http://norcie.com/pgp.txt

Fingerprint:
73DF-6710-520F-83FE-03B5
8407-2D0E-ABC3-E1AE-21F1

/***********************************/
Received on Thursday, 20 August 2015 19:39:21 UTC