Re: I-D Action: draft-pauly-httpbis-geoip-hint-01.txt from David Schinazi on 2024-10-29 (ietf-http-wg@w3.org from October to December 2024)

From: David Schinazi <dschinazi.ietf@gmail.com>
Date: Mon, 28 Oct 2024 18:05:23 -0700
To: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Cc: Dustin Mitchell <djmitche@google.com>, Ted Hardie <ted.ietf@gmail.com>, Ben Schwartz <bemasc@meta.com>, Watson Ladd <watsonbladd@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAPDSy+6cCDVafNUT0g4sn0XPZ=420H3nEPqkYJg=O8Yxp9Og2Q@mail.gmail.com>

Hi,

I'm combining multiple responses into one email, and adding to what Dustin
wrote.

To Ted's first point, the draft authors are all absolutely open to changing
our technology choices. However, we can't commit to changing our employers'
business practices - as you can imagine, that is beyond our paygrades. So
we have to solve the technical problem at hand within the confines of what
we have control over as engineers. So Stephen, while fixing all abuses of
user location on the Internet is a worthwhile goal, it's not something that
we can achieve with an RFC.

From Ted's email, it's clear that we (the draft authors) did a poor job of
explaining which IP the geolocation data is derived from. Let me first
define terminology:
1) client IP. This is the IP that a publicly-accessible server would see if
the client were to open a direct TCP connection to that server. If the
client is behind a NAT (or multiple NATs), this is the public IP of the NAT
furthest from the client.
2) proxy egress IP. This is the IP that a publicly-accessible server would
see if the client were to open a proxied TCP connection to that server,
where all the application-layer bytes are flowing through the proxy.

When a privacy proxy is enabled, that means that when a user is visiting a
website, the website will see the proxy egress IP. If the privacy proxy is
disabled, the website will see the client IP.

The key property we want here is that the new location information that
we're sending provides no more information than what can be derived solely
from the client IP.

To use an example:
* my client IP identifies my household, and combined with other data such
as screen size and user agent, pretty much identifies my device - so all in
all it identifies "David Schinazi"
* the geolocation data that we derive from my client IP would map to "San
Francisco"
* the proxy egress IP would map to "Northern California"

Our goal here is to give the user a browsing experience where searching for
"pizza" will find restaurants in San Francisco, not Sacramento. Because
otherwise the user disables the privacy proxy, and we're back to leaking
their client IP to all websites.

In order for us to reach these goals, we need the geolocation to be based
on the client IP while only providing websites with the proxy egress IP. I
can't think of a way to make that work purely with communication between
the proxy and website.

David

On Mon, Oct 28, 2024 at 10:23 AM Stephen Farrell <stephen.farrell@cs.tcd.ie>
wrote:

>
> Hiya,
>
> On 28/10/2024 16:56, Dustin Mitchell wrote:
> >
> > This also provides an opportunity to incrementally improve the situation
> > for tracking of users' location by making it an active signal that is
> under
> > the control of the client (rather than the client's ISP).
>
> Is the above actually accurate though? ISTM that detailed location
> data is mostly sent by clients in payloads, so adding an HTTP header
> seems like it has no effect on such application layer leakage other
> than to add a new way in which location details can be exposed.
>
> Defining the problem to be addressed to be only a tiny part of the
> abuses of location data seems to me a wrong starting point in a
> different sense to the one Ted described already.
>
> Cheers,
> S.
>
>

Received on Tuesday, 29 October 2024 01:05:40 UTC