Re: Guidance, I mentioned on the call from Hannes Tschofenig on 2012-08-25 (public-privacy@w3.org from July to September 2012)

From: Hannes Tschofenig <hannes.tschofenig@gmx.net>
Date: Sat, 25 Aug 2012 12:25:21 +0300
To: David Singer <singer@apple.com>
CC: Rigo Wenning Wenning <rigo@w3.org>, "public-privacy (W3C mailing list)" <public-privacy@w3.org>
Message-ID: <50389A01.1010501@gmx.net>
Hi David,

in the guidelines we had specifically focused on aspects that concern 
protocol specification development. For that reason we are not talking 
too much, for example, about the duration of data storage.

For example, we ask a question regarding the persistence of identifiers 
because the ability to create new identifiers on the fly typically has 
impact on the entire protocol architecture. For example, in SIP a 
separate mechanism was defined to request and obtain these identifiers. 
When used the also have an impact on the access control mechanisms (when 
you think about white- and blacklists, or reputation that is associated 
with identifiers).

Your comments quite nicely illustrate the importance of deciding about 
the target audience and the scope of guidelines Your questions would be 
most likely targeted to someone who is building a product rather than a 
specification.

Let us assume we are talking about a presence based system and go 
through your questions below.

On 08/23/2012 09:13 PM, David Singer wrote:
> Cool, this is nice
>
> when writing specifications, I guess we could have similar questions:
>
> * what elements of the system or protocol have real-time access to data about the user, user agent, or device?

In SIP or XMPP the protocol specification give an answer to that 
question. For example, RFC 4119 allows to share location information in 
real-time.

For an entire product, however, a number of specifications would have to 
be combined and maybe certain features from specifications will be omitted.

> * where does the protocol propagate that data 'in normal operation'?

At the specification level this would hopefully also be defined in the 
spec itself or an companion document. For example, the GEOPRIV 
architecture document (RFC 6280) tries to illustrate the data flow in a 
verbose fashion.

We did, however, add a question regarding this aspect to make it more 
explicit that this would be good to know from a privacy point of view:
" What information does the protocol expose about individuals, their 
devices, and/or their device usage (other than the identifiers discussed 
in (a))?"

For a product, however, other aspects would have to be considered. For 
example, there one would care about the databases where data is stored. 
Particularly with highly scalable designs the database architecture can 
be quite sophisticated and there is a lot of data that is worthwhile to 
protect (or phrased differently, interesting for an attacker). With the 
storage of data there is also the question of the applicable 
jurisdiction, etc.

> * do the elements exposed to the data also have avenues to store it persistently?

Protocol specifications typically indicate what information they use in 
order for their protocol to work. Often, however, the lifetime of 
specific data items is subject to configuration. For example, think 
about the cookie lifetime.

For a product on the other hand one had to make a decision about the 
protocol configuration and therefore an answer whether the cookie is 
valid for 3 years or only for a week would have to be answered.

Although the more interesting answers can be provided at a product level 
we still have the aspect of retention in the document since it 
frequently showed up in IETF protocol designs that concern logging.

> * can they combine the data with data from previous sessions for the same user, or other activity by the same user?

This is indeed an aspect we were interested in from a protocol design 
point of view. The question regarding 'Correlation' relates to this.

> * is the data likely sufficiently precise to identify an actual person?

We are indeed concerned about this question and Section 6.2 deals with 
this aspect.

 From a protocol specification point of view one has to make certain 
assumptions (which are also discussed in the privacy considerations 
draft) since protocols are often generic enough to be used in a variety 
of contexts. For example, if you consider a persons who is using a PC in 
a household. The same PC may be used by a number of persons and it may 
not always be easy to associate a specific interaction with a single 
individual.

At the level of a product there is, however, more information available 
than in a single protocol specification. For example, consider the 
presence based system again. There, you are like to have some decisions 
about how user authentication is accomplished and how users subscribe to 
the service. There is probably some form of identity proofing involved 
(even though the quality may be rather poor, for example using only 
email verification). All this information will increase the ability to 
identify a natural person.


> * if data is stored persistently, what elements have access to that store?

We touch this aspect a little bit with the question about 'b.  Stored 
data compromise.'.

However, this is really something that has to be answered at a product 
rather than a protocol specification level. The access control policy 
for access to a database (and all the processes around it) are typical 
far beyond what any IETF or W3C specification would ever describe.

Ciao
Hannes

>
> … and so on
>
> On Aug 23, 2012, at 9:25 , Hannes Tschofenig <Hannes.Tschofenig@gmx.net> wrote:
>
>> Hi Rigo,
>>
>> in the call today we discussed about the way guidance (to specification authors) could be given.
>>
>> Here is what we have done with the IAB privacy considerations document:
>>
>> --------
>>
>> 6.  Guidelines
>>
>>    This section provides guidance for document authors in the form of a
>>    questionnaire about a protocol being designed.  The questionnaire may
>>    be useful at any point in the design process, particularly after
>>    document authors have developed a high-level protocol model as
>>    described in [RFC4101].
>>
>>    Note that the guidance does not recommend specific practices.  The
>>    range of protocols developed in the IETF is too broad to make
>>    recommendations about particular uses of data or how privacy might be
>>    balanced against other design goals.  However, by carefully
>>    considering the answers to each question, document authors should be
>>    able to produce a comprehensive analysis that can serve as the basis
>>    for discussion of whether the protocol adequately protects against
>>    privacy threats.
>>
>>    The framework is divided into four sections that address each of the
>>    mitigation classes from Section 5, plus a general section.  Security
>>    is not fully elaborated since substantial guidance already exists in
>>    [RFC3552].
>>
>>
>> 6.1.  General
>>
>>       a.  Trade-offs.  Does the protocol make trade-offs between privacy
>>       and usability, privacy and efficiency, privacy and
>>       implementability, or privacy and other design goals?  Describe the
>>       trade-offs and the rationale for the design chosen.
>>
>>
>> 6.2.  Data Minimization
>>
>>
>>       a.  Identifiers.  What identifiers does the protocol use for
>>       distinguishing initiators of communications?  Does the protocol
>>       use identifiers that allow different protocol interactions to be
>>       correlated?
>>
>>       b.  Data.  What information does the protocol expose about
>>       individuals, their devices, and/or their device usage (other than
>>       the identifiers discussed in (a))?  To what extent is this
>>       information linked to the identities of the individuals?  How does
>>       the protocol combine personal data with the identifiers discussed
>>       in (a)?
>>
>>       c.  Observers.  Which information discussed in (a) and (b) is
>>       exposed to each other protocol entity (i.e., recipients,
>>       intermediaries, and enablers)?  Are there ways for protocol
>>       implementers to choose to limit the information shared with each
>>       entity?  Are there operational controls available to limit the
>>       information shared with each entity?
>>
>>       d.  Fingerprinting.  In many cases the specific ordering and/or
>>       occurrences of information elements in a protocol allow users,
>>       devices, or software using the protocol to be fingerprinted.  Is
>>       this protocol vulnerable to fingerprinting?  If so, how?
>>
>>       e.  Persistence of identifiers.  What assumptions are made in the
>>       protocol design about the lifetime of the identifiers discussed in
>>       (a)?  Does the protocol allow implementers or users to delete or
>>       replace identifiers?  How often does the specification recommend
>>       to delete or replace identifiers by default?
>>
>>       f.  Correlation.  Does the protocol allow for correlation of
>>       identifiers?  Are there expected ways that information exposed by
>>       the protocol will be combined or correlated with information
>>       obtained outside the protocol?  How will such combination or
>>       correlation facilitate fingerprinting of a user, device, or
>>       application?  Are there expected combinations or correlations with
>>       outside data that will make users of the protocol more
>>       identifiable?
>>
>>       g.  Retention.  Do the protocol or its anticipated uses require
>>       that the information discussed in (a) or (b) be retained by
>>       recipients, intermediaries, or enablers?  Is the retention
>>       expected to be persistent or temporary?
>>
>>
>> 6.3.  User Participation
>>
>>
>>
>>       a.  User control.  What controls or consent mechanisms does the
>>       protocol define or require before personal data or identifiers are
>>       shared or exposed via the protocol?  If no such mechanisms are
>>       specified, is it expected that control and consent will be handled
>>       outside of the protocol?
>>
>>       b.  Control over sharing with individual recipients.  Does the
>>       protocol provide ways for initiators to share different
>>       information with different recipients?  If not, are there
>>       mechanisms that exist outside of the protocol to provide
>>       initiators with such control?
>>
>>       c.  Control over sharing with intermediaries.  Does the protocol
>>       provide ways for initiators to limit which information is shared
>>       with intermediaries?  If not, are there mechanisms that exist
>>       outside of the protocol to provide users with such control?  Is it
>>       expected that users will have relationships (contractual or
>>       otherwise) with intermediaries that govern the use of the
>>       information?
>>
>>       d.  Preference expression.  Does the protocol provide ways for
>>       initiators to express individuals' preferences to recipients or
>>       intermediaries with regard to the collection, use, or disclosure
>>       of their personal data?
>>
>>
>> 6.4.  Security
>>
>>
>>
>>       a.  Surveillance.  How do the protocol's security considerations
>>       prevent surveillance, including eavesdropping and traffic
>>       analysis?
>>
>>       b.  Stored data compromise.  How do the protocol's security
>>       considerations prevent or mitigate stored data compromise?
>>
>>       c.  Intrusion.  How do the protocol's security considerations
>>       prevent or mitigate intrusion, including denial-of-service attacks
>>       and unsolicited communications more generally?
>>
>>       d.  Misattribution.  How do the protocol's mechanisms for
>>       identifying and/or authenticating individuals prevent
>>       misattribution?
>>
>> --------
>>
>>
>> Ciao
>> Hannes
>>
>>
>
> David Singer
> Multimedia and Software Standards, Apple Inc.
>
>
Received on Saturday, 25 August 2012 09:25:55 UTC