Re: I-D Action: draft-pauly-httpbis-geoip-hint-01.txt

Hi David,

A couple of replies in-line.

On Sat, Oct 26, 2024 at 12:57 AM David Schinazi <dschinazi.ietf@gmail.com>
wrote:

> Hi Ted,
>
> While going back to the drawing board can be sad, I'm definitely open to
> it. We have specific design requirements, but we're not wedded to any
> particular solution. I'm not sure I understand your alternative proposal
> though. In today's world, privacy proxies already publish their egress IPs
> publicly along with the corresponding geos. (For example, Apple's is at
> [1].) One issue is that everyone hasn't ingested that list, but that could
> be solved over time.
>

First, the proposal I made is little more than a few hand waves, but I am
willing to work something out in more detail if you and Tommy (or others)
are interested.  I think anything in the space relies on a couple of
assumptions about the willingness of different parties to change both their
technology and their business practices, but I think a proposal without a
client hint could work out to be simpler.


> The other issue is that we'd like to reduce the granularity of this
> published mapping. This has two advantages: first it saves the proxy
> provider money now that IPv4 addresses are expensive, and second it
> improves privacy - because now the egress IP has a more coarse geographic
> mapping, and only the servers that request the client hint get access to
> the more detailed location.
>

To be clear on this design goal, you wish to have a public view of the
geo-location of the IP which is coarse and broadly available plus a
different, detailed view of the geo-location which is made available only
to the clients of the privacy proxy.  When the client cares to, it can
provide this more detailed data.

This seems at odds with what the document states as the method by which the
client populates the data, which specifies only that it gets it from a
geo-ip database.  It's pretty contrary to the basic privacy property I
thought that you hoped it conferred, which was that sharing that
information would be no worse than the publicly available data.  Now it
appears it will be more detailed.  I've tried to account for that design
goal in my sketch below, but I think the privacy properties depend a great
deal on how much more detailed it is.


> The browser can also now choose to refuse to send the client hint if it
> determines that the server shouldn't have this information. Unless I'm
> misunderstanding your proposal, it doesn't provide either of these two
> advantages.
>

I think we need a whiteboard, but I'll lay out my thinking in a little more
detail.

A server advertises via .well-known that it supports the receipt of a
source-originated geofeed in a specified format (which would be limited to
avoid the lat/long issues and similar privacy issues).

A proxy may test for that service and provide a geo-ip mapping for a single
Egress IP, which is authenticated by a return routability check.(1)  This
would have a specified TTL.

The server that has received that geo-location will place it in a local
geo-ip database, which it consults when it wants to provide
geography-specific resources.  (Whether it also consults other geo-ip data
is not something that this proposal can control, but my guess is that it
would sanity check the provided data against other data).

A client indicates a desire to share more detailed geographic data with a
particular service in its interaction with a particular proxy.  When that
happens, the proxy updates the server's view of the geo-ip by associating
that egress IP with the new data.  See (2) below for the concurrency issue,
but the TTL for this would be very low and/or the proxy would reset it to
coarse version after the end of the session.

There are definite trade-offs to this approach, but there are some
advantages.  First, the client cannot be misconfigured to provide truly
detailed location data via this.  I think this could happen with your
proposal because the client may get multiple views of its geolocation if it
uses multiple proxy services and some of those may be much more detailed
than the Apple or Google services (yes, I am once again alluding to
enterprises here). Second, this approach means servers get fresh, if
coarse, geo-ip data from proxies which is valid even for clients that have
not been updated to use the  new hints.

Again, my reason for sketching this out isn't to claim that this is the
best approach.  It's to convince you that we need to have an architectural
discussion before accepting this document.  I have learned a great deal
about what you're trying to build, but I think there are other use cases
with other risks that have to be considered before we standardize anything.

regards,

Ted Hardie

(1) You could expand this to a range, but you would then need something
like ACME to validate control of the range.
(2) To handle concurrent connections via the same proxy to the same
instance of the service with different locations or willingness to share
location, you would need to disambiguate them via something like a
connection ID.  That requires a lot more thought than this email contains,
but there are some advantages (since you can trivially change the
connection ID)



> David
>
> [1] https://mask-api.icloud.com/egress-ip-ranges.csv
>
> On Fri, Oct 25, 2024 at 6:54 AM Ted Hardie <ted.ietf@gmail.com> wrote:
>
>> Thanks to Tommy for his previous comments; since this occurs later in the
>> thread and addresses one of the points I made as well, I'm choosing to
>> answer here, but I have read the full thread to this point.
>>
>> On Thu, Oct 24, 2024 at 9:54 PM David Schinazi <dschinazi.ietf@gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I'm realizing I've been using some terminology without defining it,
>>> leading to some confusion. Let's create a distinction between two distinct
>>> kinds of IP-hiding technologies.
>>>
>>> 1) privacy proxies. Examples of these include Google's IP Protection and
>>> Apple's iCloud Private Relay. These are affiliated with a browser, and
>>> integrated pretty tightly with that browser (and/or operating system). The
>>> goal of these is to prevent websites from having access to the user's IP
>>> address, because that represents a stable tracking identifier. However,
>>> these privacy proxies do not try to hide the user's coarse location. They
>>> look at the client's IP address, map that to a city (for Google, we map it
>>> to the closest grouping of 500'000 people for example), and then the
>>> privacy proxy picks an egress IP address that's registered to that city in
>>> a public geofeed. While websites have lost the ability to see the client's
>>> IP address, they can still access the client's coarse location. Note that
>>> this coarseness is often configurable by the user.
>>>
>>>
>> Combined with Tommy's answer, what we see is a problem with data known to
>> the geo-ip database about the egress IP selected by the privacy proxy.  If
>> it is stale or wrong, the client gets a worse experience.  You want to
>> improve that experience by having the privacy proxy select the location
>> (based on its knowledge of source IP) rather than the server select it
>> based on its geo-ip lookup of the egress IP.   This would presumably also
>> allow the privacy proxies to use fewer egress IPs.
>>
>> The difficulty I have here is that your technical solution is in no way
>> limited to that deployment.  As Ben's pointed out, there are a bunch of
>> related deployments in which a standard VPN provider might want the same
>> thing, and I am sure that once this is standardized we will see it used in
>> places where there is no proxy in use at all (enterprises, for example,
>> using DHCP location on the device to populate this and then give
>> location-appropriate responses at service portals etc.).
>>
>> If we step back to the key issue, a completely different approach would
>> be for a service to indicate its willingness to get crowd-sourced geofeeds
>> from privacy proxies or other intermediaries.  Those intermediaries could
>> test for that service and provide an up-to-date and appropriate geolocation
>> for their egress IPs.  That sorts the issue of the geolocation being stale
>> in a database by allowing for the creation of a local database that is
>> correct, but leaves the rest of the system as it is.  That approach has its
>> own technical issues (you'd need to manage authentication, for example by a
>> return routability check), but the simple fact that there are completely
>> different approaches is why I want to push us back to the architectural
>> discussion.
>>
>> I'm sure that's not terribly welcome feedback given that this document
>> has already been percolating for 2 years, but I think that there is ample
>> evidence that folks would be willing to engage in the discussion if you
>> wanted to set up a design-team mailing list and hash it out.
>>
>> Thanks again for your willingness to engage and on the improvements and
>> comments to date.
>>
>> regards,
>>
>> Ted Hardie
>>
>

Received on Saturday, 26 October 2024 14:35:35 UTC