Re: FYI: Oblivious HTTP from Soni L. on 2021-01-30 (ietf-http-wg@w3.org from January to March 2021)

From: Soni L. <fakedme+http@gmail.com>
Date: Sat, 30 Jan 2021 08:48:38 -0300
To: Martin Thomson <mt@lowentropy.net>
Cc: ietf-http-wg@w3.org
Message-ID: <cb7955ca-2377-be39-7917-90a27b1c20ff@gmail.com>
On 2021-01-28 9:19 p.m., Martin Thomson wrote:
> Thanks for the thoughtful response Roy.
>
> Roy T. Fielding wrote:
> > Personally, I think we have passed the inflection point where anonymous use
> > of the Internet is an ideal, and we are quickly heading back to a world where
> > accountability for its misuse is just as important. But that's a digression.
>
> I would love to indulge this digression, because I'm forced to agree that lack of accountability is one of the things that has contributed to the current messy situation.  But it's a difficult conversation and I don't know that we're properly equipped for that discussion here.  I wish we were, because while we might not be entirely responsible for the situation, it is not like we haven't made contributions and we probably have resources that might be turned to helping, if only we knew how.
>   
> > Technically, I don't feel that wrapping base HTTP semantics as an 
> > encrypted
> > package containing a binary encoding of HTTP/1, and then using POST to 
> > fling
> > it around the world, is the right approach. I could understand that if 
> > the primary
> > motivation was to pass through client-selected proxies without using 
> > TLS,
> > or if the client engine was limited to javascript on a page, but not if 
> > we assume
> > the client is deliberately coding for this protocol and servers are 
> > deliberately
> > willing to accept such requests. POST is significantly harder to 
> > process than
> > it looks.
> > 
> > I suggest that it would be better to design a protocol that accomplishes exactly
> > what you want in the most efficient way possible and find ways to blend that
> > with the existing HTTPs. For example, using non-https link alternatives and
> > Alt-Svc to identify oblivious resources. Using QUIC datagrams for sending
> > opaque requests (I don't see a need to send these as HTTP/3) and receiving
> > opaque responses. Mapping to other protocols only as a way of supporting
> > backwards deployment over those old protocols.
>
> I don't think that attempting to define a new protocol for this is a good idea.  This is a protocol with niche applicability (I'll get back to this in a bit), so reuse is most appropriate.  And we need a protocol that supports reliable, in-order delivery of units that might exceed an MTU in size.  We need a protocol that is able to support request/response exchanges.  Ideally, we have a protocol that doesn't assume transfer of state between exchanges (for the proxy to request/target interaction only).  Ideally, we have a protocol that can amortize setup costs well.
>
> HTTP does all of that.  It does a lot more than that, but none of the extra things HTTP provides are an obvious impediment to its use, and many could be valuable.  It also enjoys wide deployment, diverse and robust implementation support, interoperability, and a bunch of other advantageous things.
>
> I certainly agree that slinging around encrypted blobs in POST isn't making the best use of the protocol.  But I'm not aware of any alternative option that is even comparable on any metric that matters.  The original oblivious DNS design used the DNS protocol, and the extent to which that required compromise to even get it to work was pretty bad.  As much as I love designing new protocols, this doesn't seem to be the right opportunity.
>
> > OTOH, oblivious connections will mislead if there are no equivalent constraints
> > on what the client can do/enable with the content received. Almost all tracking
> > today is done based on client behavior after receipt of the primary resource,
> > including what is executed within the page, how the page components are
> > ordered, what needs to be refreshed, links to other pages, what servers
> > might share the same TLS certificate, etc. Relying on clients to restrain
> > themselves doesn't work in practice, even though it should work in theory.
> > 
> > As such, I think systems like TOR that are built on trust and actively 
> > mediate
> > and adapt to changing malpractice is a better solution. But don't let 
> > that stop you
> > from designing something better.
>
> I think that it might be best to respond with a little more context on what I believe the potential application of oblivious HTTP would be.  The draft doesn't really go into that, because that isn't really a good place to capture these sorts of things.
>
> Just to set this aside, I don't see this as building toward replicating something like Tor.  There are obvious parallels, but the two approaches have very different assumptions about trust and the incentives of various actors.  Tor, as a system, is also far more complex and ambitious.  So, by all means look for parallels in Tor, but understand that this has very different models for both the threats it considers and the environment it might be deployed in.
>
> The other thing that I think is important to understand is that - at least from my perspective - the goal is not to carry any significant proportion of HTTP requests.  For instance, I don't see this being used for web browsing.  If we're willing to echo cookies, then you can safely assume that we won't be making those requests oblivious.  And many other cases benefit from being able to establish some sort of continuity, if only to deal with denial of service risks.  Each application would have to be assessed on its own.
>
> The things that we're talking about using this are those cases where we have identified a privacy risk associated with having a server being able to link requests.  The original case in research was DNS queries, where it has been shown that building profiles of users based on their DNS activity has poor privacy properties.  At Mozilla, we're also considering this style of approach in other places that browsers make requests with information that might be sensitive, like telemetry reporting.
>
> There are non-trivial costs associated with setting this up.  As a proxy needs to be run by a separate entity, but they don't see any direct benefit from the service they provide, you have to arrange for their costs to be met somehow.  You need to do so in a way that the server can ensure that the proxy is not enabling DoS attacks, while also retaining sufficient independence that clients can trust the proxy.  This is harder as the use cases become more general, but we believe that this can be arranged for certain specific cases.
>
> Does the explanation about applicability help?  I realize now that I shouldn't have left this up to inference, and the draft should probably at least address the point directly, so I'll make sure that the next version does something about that.
>

While I understand this is intended for use with DNS, I can't help but 
notice some parallels between this and a previous proposal of mine, 
https://lists.w3.org/Archives/Public/ietf-http-wg/2019JanMar/0164.html

Simply encrypting the requests through a single common proxy isn't gonna 
prevent the server from identifying the user. Even worse - it 
additionally lets the common proxy identify the user. You *need* 
multiple independent proxies.

The main difference between this proposal and my previous proposal 
however, is that this attempts to prevent correlation of requests by the 
server, whereas my proposal attempts to prevent correlation by the 
proxy. Also, one major flaw with my proposal is the possibility of the 
CDN/proxy hijacking the nested TLS by making its own certs through 
something like Let's Encrypt, by pretending to be a plain HTTP/S server, 
whereas this seems to be about user-selected proxies.

Have I understood these correctly?
Received on Saturday, 30 January 2021 11:48:56 UTC