[web-annotation] Privacy Interest Group (PING) review from Ivan Herman via GitHub on 2016-04-09 (public-annotation@w3.org from April 2016)

From: Ivan Herman via GitHub <sysbot+gh@w3.org>
Date: Sat, 09 Apr 2016 02:31:47 +0000
To: public-annotation@w3.org
Message-ID: <issues.opened-147080406-1460169104-sysbot+gh@w3.org>
iherman has just created a new issue for 
https://github.com/w3c/web-annotation:

== Privacy Interest Group (PING) review ==
(This review came in via 
[email](https://lists.w3.org/Archives/Public/public-annotation/2016Apr/0027.html),
 sent by Greg Norcie <gnorcie@cdt.org>. I have copied the text to the 
issue with only formatting changes. IH.)

Hi all,

Ivan Herman reached out to PING to share a trio of documents relating 
to the Web Annotation model:

* The Web Annotation Protocol[1]
* The Web Annotation Vocabulary[2]
* The Web Annotation Data Model[3]

Together, these documents propose a way for “annotation servers” to be
 set up, which can manage and store annotations about websites.

To start off, I wanted to list off some high level takeaways I 
gathered. I have also included a run through of the PING privacy 
questionnaire[4] I developed.

1. Annotations, like all other internet traffic should probably be 
sent via HTTPS. The IETF has termed pervasive monitoring as an 
“attack[4], recommending all traffic be sent over HTTPS to avoid said 
attack. Similarly, the United States CIO has stated that “All browsing
 activity should be considered private and sensitive. An HTTPS-Only 
standard will eliminate inconsistent, subjective determinations across
 agencies regarding which content or browsing activity is sensitive in
 nature”. [5]

2. I wasn’t clear reading this spec: Are annotation servers always 
controlled by the operators of a given site? Or can one annotation 
server annotate any website? Regardless, there there be an opt out 
mechanism, similar to a robots.txt on a standard web page? I 
especially worry about the issue of harassment, which has been raised 
with other annotation services like Genius[7].

3. Finally, I feel it’s important that there be mechanisms to edit and
 delete annotations. Annotation servers should not be “write only”. In
 other contexts such as on Facebook[8], users often regret the data 
they upload - I expect that the annotation servers will have similar 
incidents.


* [1] https://www.w3.org/TR/2016/WD-annotation-protocol-20160331/
* [2] https://www.w3.org/TR/2016/WD-annotation-vocab-20160331/
* [3] https://www.w3.org/TR/2016/WD-annotation-model-20160331/
* [4] https://gregnorc.github.io/ping-privacy-questions/
* [5] http://www.w3.org/2001/tag/doc/web-https
* [6] https://https.cio.gov/
* [7] 
http://www.dailydot.com/technology/genius-annotations-online-harrassment/
* [8] “I regretted the minute I pressed share”: A Qualitative Study of
 Regrets on Facebook 
http://cups.cs.cmu.edu/soups/2011/proceedings/a10_Wang.pdf

In addition to these high level takeaways, below I have walked through
 the PING Privacy Questionnaire and included my responses. I encourage
 other standards developers to consider using the self questionnaire -
 and I welcome feedback on how this questionnaire can better help spec
 authors perform privacy audits:

* Does this specification have a "Privacy Considerations" section?
        * Not currently.
* Does this specification collect personally derived data?
        * No. Users could put personal data in a tag if they chose, 
but that is not something the spec specifically asks for or 
encourages.
* Does this specification generate personally derived data, and if so 
how will that data be handled?
        * No, this standard does not directly generate identifiable 
information such as audio or video.
* Does this standard allow an origin direct access to a user’s 
location, and if so is that information minimized?
        * No, the Annotation Protocol does not collect location data.
* How should this specification work in the context of a user agent’s 
"incognito" mode?
        * The same as without, assuming the server is accessed via the
 browser.
* Is it possible to spoof/fake the data being generated for privacy 
purposes?
        * I assume users could use a proxy, VPN, or Tor to access the 
annotation server.
* Does the standard utilize data that is personally-derived, i.e. 
derived from the interaction of a single person, or their device or 
address?
        * No.
* Does the data record contain elements that would enable 
re-correlation when combined with other datasets through the property 
of intersection (commonly known as "fingerprinting")?
        * However I would like to point out that PING  has previously 
discussed sensor-specific question that can get at cross-device or 
cross-UA signaling. (The Vibration API). Can I get a volunteer to 
submit a pull request to add language that would add language to 
capture this threat model to the existing questionnaire?
* Is the user likely to know if information is being collected?
        * Yes, users must expressly navigate to and utilize the 
annotation server.
* Can the user easily, preferably through an element of the GUI, 
revoke consent granted to a particular feature?
        * Again, not clear if users will have the ability to 
delete/edit annotations. Hopefully there will be a discussion on this 
feature - users often regret posts on social media[8], and it’s 
important they be able to delete their posts.
* Once consent has been given, is there a mechanism whereby it can be 
automatically revoked after a reasonable, or user configurable, 
period?
        * I’m not 100% clear, but I would hope that users can delete 
their annotations if they choose to do so.
* Does this standard utilize strong end to end encryption?
        * I see no mention of using HTTPS in this standard. I’d like 
to see language added that Annotation servers must use TLS.


Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/204 using your GitHub 
account
Received on Saturday, 9 April 2016 02:31:49 UTC