Accept headers and privacy

I just finished writing the `privacy issues' section for my `new
content negotiation sections' document for the content negotiation
sub-workgroup, and realised that the issues addressed in it might be
interesting for a broader audience, in particular considering the
recent Vary header discussions in the caching subgroup.

I include the finished `privacy issues' section below.  It may be a
bit hard to read out of context, but the general issues should be
clear.

Koen.

--snip---

 ii. Privacy issues

 ii.i. Session tracking using Accept headers
     
     If all users fine-tune quality factors put into the default
     user agent accept headers to the third decimal, these accept
     headers can be used as relatively long-lived user identifiers,
     enabling content providers (even if they do not provide
     negotiable resources) to tell apart different users behind a
     proxy. This identification allows content providers to do
     clicktrail tracking, and allows collaborating content
     providers to match cross-server clicktrails or form
     submissions of individual users.  Thus, privacy reasons demand
     that user agents are conservative 1) in the amount of quality
     factor fine tuning they allows to users without giving a
     warning about privacy and 2) in sending long accept headers by
     default in a request. (See also the remarks on sending short
     accept headers for performance reasons in Section 12.2).

 ii.ii Accept headers revealing information of private nature
       without real need.
     
     Preferences sent in Accept headers, in particular language
     quality factors sent in Accept-Language headers, may reveal
     information that the user rather keeps private unless it will
     directly improve the quality of the service.  The content
     negotiation mechanism [I will define in the finished version of
     the document] allows users to leave some languages
     (e.g. languages the knowledge of which strongly correlates with
     membership of a particular ethnic group) out of the
     Accept-Language header without decreasing the quality of the
     negotiation process if the request happens to be on a negotiable
     resource.  Note however that the speed of the negotiation process
     may be affected.

     No matter how much information is left out of the Accept
     headers, automatic reactive negotiation by a user agent on a
     negotiable resource will inevitably reveal some of the user
     preferences by the generation of a request on the best
     representation resource as partly determined by the user
     preferences. Malicious service authors could provide `fake'
     negotiable resources, which not even bind to representation
     resources that are in fact different, whose only purpose is to
     get information about (ethnicity correlated) languages
     understood by the visiting users.  Such plots would however be
     visible to alert victims, as user agents allow the user to
     review a list of all representations bound to the negotiable
     resource.
     
     Maintainers of firewall proxies may want to process outgoing
     accept headers to enhance privacy beyond the level provided by
     the user agents behind the firewall.

Received on Wednesday, 14 February 1996 09:40:20 UTC