Re: Browsers and .onion names from Eliot Lear on 2015-11-30 (ietf-http-wg@w3.org from October to December 2015)

From: Eliot Lear <lear@cisco.com>
Date: Mon, 30 Nov 2015 08:55:03 +0100
To: Alex Rousskov <rousskov@measurement-factory.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <565C00D7.2050901@cisco.com>
Alex, thanks for your clarification.  Please see below.

On 11/30/15 8:07 AM, Alex Rousskov wrote:
>
> AFAICT, the biggest one is that the correct "special handling" by
> non-Tor applications is undefined. You have [essentially] said "If
> you're about to query the DNS for a .onion name, then pass an error back
> to the user instead". This instruction is missing the "else" clause:
> What if I am not about to query the DNS for a .onion name? What should I
> [not] do to comply in that case?

As Mark pointed out, RFCs aren't laws.  There are only goals beyond core
functionality:

  * Interoperability
  * Security

The special use registry needs to address both (IMHO); but then going
beyond that people need to address the concern in specific contexts. 
We're here because RFC 7686 mentions applications and proxy
functionality.  In the context of HTTP, one can argue that a forward
proxy in particular is an agent and in some sense part of the client. 
Mark occasionally uses a term like split browsers.  In those contexts,
the amount of semantic processing capability of the browser may be so
limited that the responsibility for keeping the query off the network
shifts to the proxy.  But in considering that aspect, one also has to
consider whether forwarding the query itself is dangerous.  That is, can
the client reasonably trust the proxy?  What happens if the proxy is NOT
aware of .onion?  Can the client or its user be harmed?  What harm, to
me, means is whether private information is inadvertently leaked.  To me
the answer is “yes”.

Hence, I conclude that one should be cautious about sending queries
ending in ".onion" and having some discovery mechanism for, say, proxy
functioning of such names.  This is especially the case with .onion
names which have the service name in them and might require
substantially more processing in the proxy than might otherwise be
performed.  One might view .onion in the same light as a recursive SOCKS
proxy.

In other words, the safest thing to do is to drop the request and return
an error unless the client is specifically aware that the proxy can
process .onion requests, and it itself is able to deal with the results.


>
> As my earlier questions attempted to show, there are many areas where
> there is no DNS lookup but there may be leakage, and it is not clear
> whether that leakage complies with or violates the spirit of RFC 7686.
>
> While the RFC claims that
>
> """.onion names are "special" [because] they require hardware and
> software implementations to change their handling in order to achieve
> the desired properties of the name."""
>
> Those "desired properties" are left undefined as far as non-Tor
> applications are concerned so it is difficult for a non-Tor application
> to know what to do with .onion names (other than not doing a DNS lookup
> with them -- which is usually best handled by a DNS library anyway).

Well indeed.  But the question is whether this group can provide
specific advice for http client and proxy implementations with regard to
the handling of .onion.  Unless I grossly misunderstand Mark (it's
happened before), I believe his intent was to determine what that
handling should be (I don't think he got as far as stating what it
should be).

Amos' point is fair: it's generally not considered okay, and even
perhaps ridiculous, to place demands on systems that are not intended to
implement a specification.  But we're here because not doing so has its
own risks.

On the other hand one can read too much into the following statement:

> """
> The presence of a host subcomponent within a URI does not imply that the scheme requires access to the given host on the Internet.  In many cases, the host syntax is used only for the sake of reusing the existing registration process created and deployed for DNS, thus obtaining a globally unique name without the cost of deploying another registry. However, such use comes with its own costs: domain name ownership may change over time for reasons not anticipated by the URI producer.  In other cases, the data within the host component identifies a registered name that has nothing to do with an Internet host.  We use the name "host" for the ABNF rule because that is its most common purpose, not its only purpose.
> """

If memory serves this was stated as such because we were in essence
using URIs as names where the URI was never intended to be resolved, but
yet guaranteed to be unambiguous as to its meaning (e.g. XML name
spaces).  While there's no hard and fast rule about naming on the
Internet, alternative name spaces pose their own set of problems.  For
one thing, they leak.  That was a major part of the impetus of RFC 7686,
and the RFCs that established the registry.

And so to this point:
> Personally, I'd *really* prefer the Web not to be locked into one address resolution protocol (especially when you look at how problematic our current solution can be).

While "locked into" might not be what I want either, the benefit of
using a single address resolution protocol is that there is inherent
consistency.  Those publishing a name do so in one way for which it is
assumed that the client understands.  That is quite powerful.  The
limitation is that it is *incredibly* hard to evolve that mechanism due
to its ossification.  They are yin and yang.  If that sounds like I'm
divided on this issue, you've then you've understood my meaning.  I
would suggest it is not black and white and that there are tradeoffs.

So... pragmatically, what is appropriate?

Eliot
Received on Monday, 30 November 2015 07:55:48 UTC