Re: Call for Adoption: draft-pauly-httpbis-geoip-hint from Ted Hardie on 2022-09-06 (ietf-http-wg@w3.org from July to September 2022)

From: Ted Hardie <ted.ietf@gmail.com>
Date: Tue, 6 Sep 2022 09:44:01 +0100
To: Tommy Pauly <tpauly@apple.com>
Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CA+9kkMAScAXrFrxd3wCJwfw7CDJv0nph50Bf2-hbRPskypffQg@mail.gmail.com>
Hi Tommy,

Thanks for your quick reply.  Some additional comments in-line.

On Mon, Sep 5, 2022 at 9:40 PM Tommy Pauly <tpauly@apple.com> wrote:

> Hi Ted,
>
> Thanks for the comments. I appreciate where you’re coming from on this,
> certainly. I did want to explain the context here of why this proposal has
> the shape it does.
>
> The information being included in the header is the information that is
> already carried by the externally visible IP address presented to the
> server, and to that end is not designed to send any more information than
> would already be known to a server that has an up-to-date view of various
> geo IP databases.
>

On this point, I think we disagree.  The externally visible IP address is
essentially a key for a database lookup, where you propose to send the data
itself.  First, some of the sites that might receive this might not have
used such a database, but they will get the data anyway if this is made
common.  Second, the data contained by these databases and the data being
presented may be different (after all, the whole point is that the database
lookup may return stale data); this means you have to do the privacy
analysis from the perspective of user-disclosed data rather than externally
derived data.

All of this sets aside the question of whether folks are actually all that
happy with geolocation databases and their use.

Turning it “off” would not meaningfully improve anyone’s privacy. (Ensuring
> that this value matches what would be known anyway and not allow clients to
> accidentally put more information could be discussed, etc, of course.)
>
> See above, on the point that this data might be sent to sites that did not
derive geolocation data from database lookups.  Second, I'm not sure how
"value matches what would be known anyway" can be the goal here, since
updating stale or missing data is.


> Instead, the fundamental goal behind the proposal is to enable improved
> privacy solutions that provide overall less identifying data and less
> location-related information to servers. The status quo is a world where
> most users are generally accessing websites without a proxy or a VPN, using
> IPs provisioned by their ISP / carrier; and servers use various proprietary
> geo IP database services to determine user location, often with extreme
> accuracy.
>
> Your current draft says:

   This header is intended to be used to provide rough geolocation hints
   to servers that do not already have accurate or authoritative
   mappings for the IP addresses of clients.  This can be particularly
   useful for cases where IP geolocation mappings have changed recently,
   or a client is using a VPN or proxy that may not be commonly
   recognized by servers.

This does not appear to me to map well to what you are describing.

The use of client-hints in cases where a VPN or proxy is in use needs a lot
more thought and description than this draft gives them.  While you may be
imagining a case in which this rough location gives customization
advantages to someone using an oblivious HTTP proxy, there are a lot of
other uses of VPNs and proxies, and some of them are designed to mask user
data that this discloses.

This data is also more or less rough depending on the circumstances. Let's
assume that the client gets this data from DHCP, using RFC 4776-style
configuration.  First you will note that none of the civic address elements
below A1 are required:  https://www.rfc-editor.org/rfc/rfc4776#page-5. That
means the data available to the client and the data which this requires may
be different, forcing clients to decide whether to restrict what they send
or to send _more_ precise data than is required.  That needs a good bit of
thought, and, in general, the question of where the client gets the data
and how trustworthy it is needs attention.  Second, you will no doubt
recognize that the A2 element is wildly variable in the size of population
it represents (e.g. the Andoran civil parish vs. US-CA).  Lastly the "city"
element of the geofeeds specificaiton is a free text field  "The city
field, if non-empty, SHOULD be free UTF-8 text, excluding the comma" and
could refer to such bustling metropolises as Fordwich, England which boasts
of 400 residents; this dwarfs Concho, AZ, with 38.


> In order to prevent implicit location tracking, we need clients to user
> more proxied IP addresses. In order to make the use of such IP addresses
> ubiquitous, the experience needs to be sufficiently good that most/all
> users are willing to use a proxying service. From my experience in
> deploying such a service, compatibility due to incorrect geo IP mappings
> was one of the biggest hurdles for rolling out proxying using privatized IP
> addresses — IP addresses that used broader location granularity than the
> user would have had otherwise, and allowing users to select “country-wide”
> anonymous IP addresses. The user experience problems range from sites
> showing up in incorrect languages to blocking access altogether because
> they think they’re from a country that isn’t served by the site.
>
> The current mechanism to rectify the situation involves a good deal of
> manual outreach to various geo IP database providers and website operators,
> asking them to update their mappings. But even this doesn’t work perfectly,
> since many of these databases only have a notion of city-specific IP
> addresses, and cannot support broader notions of
> state/province/country-wide IP addresses.
>
> The goal is to make it easier and faster (in time) for more
> privacy-oriented proxying services to come up and deploy usable solutions,
> without requiring the resources to evangelize and update a myriad of
> databases and perform outreach to site operators.
>
> The proposed mechanism does this by:
> - Enabling proxied IP address information to be used without being gated
> by geo IP provider updates
> - Ensuring that broader (country-wide, etc) mappings get through without
> being incorrectly translated into city-specific addresses
> - Not sharing more information than is already available based on the IP
>
>
As I said in my first message, I have no doubt as to your goodwill.  But
you are proposing a header that contains data not currently provided
directly by the client without sufficient regard to how _other_ people will
use it once it is available.  This is why I urge you to start with an
architecture document or a problem statement.  You want to achieve a
particular set of goals and have an architecture in mind, but you also wish
to start from a generic header that could be (and would be) used in a lot
of other contexts, potentially by people whose aims do not align with
yours.  To avoid that, adding a security consideration that "this should
not be used for X or Y" is not enough and never will be.  You either need a
much narrower technical design that can only be used in your architecture
or to limit the disclosure in ways that truly do ensure that the
information is not privacy sensitive.


> It’s of course acceptable to not want to touch this topic, but my concern
> is that not addressing the myriad of privacy and usability issues in the
> status quo situation (partly due to a lack of standardized mechanisms) will
> mean that it remains too difficult to deploy new services that are aimed at
> helping users not have their location and identity tracked.
>
>
I'm afraid that you have not persuaded me that adding this will do much to
eliminate the current mechanisms; it seems additive rather than a clear
replacement.   I think that means the privacy analysis must be for "what
happens if we add this to the mix" rather than "what happens if this
replaces geolocation feeds".

best regards,

Ted Hardie


> Thanks,
> Tommy
>
> On Sep 5, 2022, at 5:38 AM, Ted Hardie <ted.ietf@gmail.com> wrote:
>
> Hi Mark, WG colleagues, and fans of Geographic Privacy,
>
> On Mon, Sep 5, 2022 at 7:47 AM Mark Nottingham <mnot@mnot.net> wrote:
>
>> At IETF 114, we saw some interest in adding hints about the client's
>> location to requests in certain circumstances, with the condition that it
>> be done in a way that doesn't compromise privacy.
>>
>> My sense (as Chair) is that further discussion might result in a solution
>> that's broadly acceptable, or it might conclude that we can't publish
>> anything. However, we need a focal point for that discussion. One potential
>> starting point would be the draft presented:
>>   https://datatracker.ietf.org/doc/draft-pauly-httpbis-geoip-hint/
>>
>> As always, if we adopt this draft it will be a starting point; it may
>> change considerably, and consensus on any remaining content will need to be
>> confirmed. I'll add one more proviso: if we can't come to consensus on a
>> solution with appropriate properties (especially, privacy), we won't move
>> the document forward.
>>
>> Please indicate whether you support adopting this draft. The Call for
>> Adoption will end in two weeks.
>>
>>
> I do not support starting the discussion with this draft.  If the working
> group would like to start a discussion on it, it should start with BCP
> 160/RFC 6280 and work out what architecture it wants to build.  The header
> format is a trivial piece to that puzzle and it can come much, much later.
> As many people who went through GeoPriv could tell you:
>
> Information on location which is sent by a client can be combined with
> other information in ways that many find startling and it is often
> redistributed in ways that people find distressing.  Starting a new "client
> sends geolocation info" project in a climate in which geolocation data is
> currently being used to target soldiers, pregnant women, and members of the
> LGBTQ community has to start from the recognition that turning this off and
> being sure it is off is job one.  I see no evidence of that awareness in
> this draft, despite my utter faith in Tommy and David's goodwill here.
> Despite that faith, the optics are _terrible_ , and adopting it without
> some attention to this is unconscionable.   If you think that the use of a
> specific VPN and a provided GeoLocation of Alabaster, AL by the client
> doesn't cause privacy concerns, I have a statue of Vulcan, currently
> located in Birminghan, AL, that I'd like to sell; it's quite handsome, and
> think you'll find it's good value for the money.
>
> Once you get such an architecture, you need to recognize that information
> provided by clients is often not trusted, and the use of trusted location
> providers to associate location with a client is thus common in some
> contexts.  You need to decide very early on whether this architecture will
> ever allow third parties to provide the location and, if so, how (e.g. by
> providing a signed object that the client can include).  Otherwise I can
> claim to be in Alabaster to get around the GDPR restrictions that US
> companies have placed on their data.  (While I see that the draft's
> security considerations  touch on this, I have absolutely no faith at all
> in the "use this, but not for access control decisions" language being
> honored in the breach.)
>
> Lastly, I will point out that this relies on a document that went through
> the independent stream and requires adherence to its feed format and
> references those feeds within this format.  Some thought as to why it had
> to go to the independent stream might be worth the attention of the authors.
>
> In short, I oppose this, and I think the working group should simply avoid
> work in this area completely; it is fraught with pitfalls.  If you must do
> work in the area, start from the right place, which is a privacy-preserving
> architecture.  Get to the header format at the *end* of that process.
>
> best regards,
>
> Ted Hardie
>
>
>
>
>
>
>
>
>> Cheers,
>>
>>
>> --
>> Mark Nottingham   https://www.mnot.net/
>>
>>
>>
>
Received on Tuesday, 6 September 2022 08:44:42 UTC