Re: [perpass] HTTP user-agent fingerprinting from Patrick Pelletier on 2013-09-13 (ietf-http-wg@w3.org from July to September 2013)

From: Patrick Pelletier <code@funwithsoftware.org>
Date: Fri, 13 Sep 2013 12:18:32 -0700
To: Stephen Farrell <stephen.farrell@cs.tcd.ie>, ietf-http-wg@w3.org
Cc: perpass@ietf.org
Message-Id: <6DCCF5ED-0AD3-42B4-83A7-7FE42DAA27D2@funwithsoftware.org>

Forwarding this idea along to httpbis as you (Stephen) suggested.   
Although this could be retrofitted onto existing HTTP, not just  
httpbis, since it's merely recommending practices which are already  
legal in HTTP.

On Sep 13, 2013, at 5:17 AM, Stephen Farrell wrote:

> On 09/13/2013 04:12 AM, Patrick Pelletier wrote:
>> On 9/12/13 1:18 PM, Dave Crocker wrote:
>>
>>>    "privacy properties of IETF protocols and concrete ways in which
>>>     those could be improved."
>>
>> One obvious thing is the amount of (usually unnecessary) information
>> leaked by the User-Agent field in HTTP.
>>
>> Should we downgrade the User-Agent field (section 14.43 of RFC 2616)
>> from a SHOULD to a MAY?
>
> I think everyone finds those values problematic, and not only for
> privacy reasons. But yes, if you believe [1] then its probably the
> biggest contributor to browser fingerprinting that's in an IETF
> spec. (No idea if that site's evaluation is sound myself though.)
>
>   [1] https://panopticlick.eff.org/
>
>> Or, if that's too radical, should we standardize a small number of  
>> fixed
>> strings to use in the User-Agent field?  (For example, "Desktop/ 
>> 1.0" for
>> desktop browsers, "Mobile/1.0" for mobile browsers, "Text/1.0" for  
>> text
>> browsers like Lynx, "Batch/1.0" for non-interactive clients like curl
>> which are performing a task more specific than crawling the web, and
>> "Robot/1.0" for clients which are crawling the web?)
>
> Interesting. An IANA registry of those kinds of value might just end
> up like the UA string though, which also started out nice and simple.

I agree that things always start out simple and get messy.  However, I  
think there are some differences:

* The original User-Agent field was not designed with privacy in  
mind.  In fact, it was designed specifically to identify the product  
and version the user is using.  So, with a different goal (privacy  
first), we will hopefully get different results.

* By specifying only a single product token, omitting comments, and  
fixing the version number at 1.0, we've already eliminated a fair  
amount of information.  And then we further limit the information by  
making the product name not the actual name of the software, but  
merely a generic indication of the type of User-Agent; whatever is the  
minimal amount of information necessary for any legitimate browser  
sniffing that needs to occur.  (Such as differentiating desktop and  
mobile clients.)

And, of course, using the simplified User-Agent strings was just one  
of my two proposals.  My other proposal, which was even simpler,  
though perhaps more radical, was to downgrade the requirement on User- 
Agent from SHOULD to MAY, and encourage browsers not to send User- 
Agent at all.  (We could even change it to a SHOULD NOT if we feel  
really heavy-handed.)  One could argue that by using other techniques  
such as responsive layout, no browser sniffing should be necessary at  
all.

> Maybe ask this on httpbis if you don't get more feedback here? That's
> where you'd find folks who know if it could be done and who could do
> it.

--Patrick

Received on Friday, 13 September 2013 19:19:03 UTC