Re: Sub-domain granularity: the poverty of the domain name as the only hook for security from Tim Berners-Lee on 2015-03-16 (www-tag@w3.org from March 2015)

From: Tim Berners-Lee <timbl@w3.org>
Date: Mon, 16 Mar 2015 14:22:57 -0400
To: Mike West <mkwst@google.com>
Cc: Public TAG List <www-tag@w3.org>, Mark Nottingham <mnot@mnot.net>, Brad Hill <hillbrad@gmail.com>
Message-Id: <ACDA6F9C-A553-4EB7-94B8-28B1156E9107@w3.org>
On 2015-03 -16, at 09:55, Mike West <mkwst@google.com> wrote:

> Would you mind clarifying the problems you'd like to address by changing the origin model?

The SOP is a hierarchical system, which allows a hierarchy of trust, which people in general find useful when designing systems in which parts of the system must be sandboxed in the sense of only given access to data from them selves and they sub-systems.
The problem is that in connecting it to the DNS system (a la https://baz/bar.foo.example.com) , but not the hierarchical nature of the path (https://example.com/foo/bar/baz), pit has suddenly forced people to design their websites using plethora of different domain names.  

1) This is a pain. Setting up different domains on a virtual server is more work.  
2) This is expensive.  It requires more certificates.
3) This is architecturally a disaster because it connects together things which should not be connected.   The DNS part of the name is already special in that it is th e granularity at which IP address lookup is done.  Taking that architectural feature and binding it to SOP (and HSTS) means that the design becomes over constrained,
4) yes, I guess you can write a server to make the '.' into '/' in the file system, but that means that relative links within the file system and relative links withing teh web space don't both work.
5) At runtime, having to look up different domain names is a waste of time.
6) At runtime, having to look up more specific domain names is a violation of privacy and plays into ubiquitous surveillance, as the surveilling power sees you look up not globalchat.org but  whythegovernmanetsucksblog.alicia.people.globalchat.org which is a bit of a giveaway. The whole use of encryption for privacy suffers from the weakness of the DNS lookups being done being visible. (And maybe TLS startup DN visibility but I am not an expert).  
7) The functionality of SOP would be very valuable in the path, as many web sites have a sense of of hierarchical delegation within the 

Let's take two examples.

A) Github.com allows people to have space like https://github.com/linkeddata and within it repositories like https://github.com/linkeddata/gold
where there is a strong sense of delegation or power. But when they decided to allow people to host their own web apps as well as just the source code control view, they had to move to //linkeddata.github.io/gold
What is wrong with this picture?

- It is a pain, as you can't make easy links between the live version and the git version.
- It is random, in that they make organizations origins but not repos.  Maybe they will have to change that later, ad all the links will break.

B) The W3C web site, www.w3.org, has space for staff to make pages about themselves.  Like http://www.w3.org/People/Berners-Lee.  It would be nice to give that space some protection from http://www.w3.org/People/Foobar.    No way are we going to break all the links and make URLs like https://berners-lee.people.w3.org/ where I would not even be able to use relative URLs to my colleagues pages and stylesheets. So we will all just have to trust each other, which is fine, because we do, but for a commercial service this would not be on.

We have to have move to tag.w3.org instead of w3.org/tag  which we have had for many years. 

What WOULD be useful is to be able to issue an HSTS header only for that subtree, http://www.w3.org/People/Berners-Lee, or http://www.w3.org/People.   The site is too big to switch all at once from http: to https: 


> On Mon, Mar 16, 2015 at 2:28 PM, Tim Berners-Lee <timbl@w3.org> wrote:
> The HSTS spec  http://tools.ietf.org/html/rfc6797 is a good start but it is not useful for serious websites which have many separate parts which have to have different policies, management, etc.
> 
> HSTS is, indeed, a sledgehammer. It's really quite good at driving fence-posts, but less useful for hanging pictures.

But an easy fix to make it subtree-wide, no?  A HSTS header on a URI can only affect policy for things whose URIs start with that.

> The per-page suborigins proposal that WebAppSec has taken up (and that Anne already pointed to) seems like it might be a reasonable way of adding some granularity. Does it go in the direction you're interested in?

Well, it introduces an arbitrary page-declared short string sub-origin which is not hierarchical.
I'd prefer to use the / nature of the path in the URI.  You can then see whether another URI is within it without having to look it up.


> 
> Similarly the Same Origin Policy in general is very hampering and in that it only works at the domain level not at any path level.   It would have been not very much harder to set both of them up to work on subtrees within the domain, and both would have been much more powerful and useful.  I propose they both be fixed in future.
> 
> Hrm. Cookies take paths into account, and I think they're generally agreed to be a mess. :)
> 
> What properties would you like to be able to share between paths? What properties do you think should be distinct (and inaccessible)?
>  
> The result of these two has been a pain and many perverse incentives and side-effects, for just one example github.com/linkdata having to half-move to linkeddata.github.com (which is now a mess and loses locality of linking between the two) and
>  
> How would you propose distinguishing between `github.com/linkeddata` and `github.com/w3c` (two distinct organiziations, which, I suppose, aren't same-origin with `github.com/`) on the one hand, and `github.com/blog` on the other (GitHub's blog, which is theoreticatically same-origin with `github.com/`)?


github.com could declare that github.com/blog had the full rights of github.com (but github.com/blog could not itself).

(In fact it may be that on thinking about it, github.com actually decide that there is no reason to give this weird blogging software they have installed full rights at all, frankly, but I am happy to use that example.)

Similarly, github.com/linkdata/ could declare that all its repos are mutually trusted in the live version -- or, quite likely, not.

github.com/w3c/live/trackerApp

github.com/w3c/live/minuteTakerApp

github.com/w3c/minuteTakerApp/tree/master



> 
> Since we're looking at GitHub, note that they migrated to `*.github.io` (and registered `github.io` as a public suffix) in order to ensure separation between user pages.

Yes.  What a pain. User pages are now accessed though the graces of of the British Indian Ocean Territory.  No relative links are possible between the repo and the site.

> In an ideal world, how would they have been able to handle that use-case?

Stick to the one domain, the one cert.
By declaring the original site as being hierarchical origins, either by default, or in a header, or a site meta file which contains all the HSTS-like things which clients should cache.

Maybe safest to make the default to protect foo/bar from foo/baz, this would mean that some existing sites would have to add new headers to allow those boundaries to be broken, but they we asked all public data site to add CORS headers when they were sitting there minding their own business.


> 
> w3.org not being able to move to HTTPS at all because of being unable to apply HTTS path by path.
> 
> I could understand HSTS being difficult to deploy for the reasons you outline here. I don't understand how that would prevent path-by-path migration from HTTP to HTTPS,

You really need to have an HSTS header (or equivalent) as that is the only way of telling clients and indexes everywhere that of your site http: and https: URIs should be considered as being equivalent.  Then, with suitable tweaks to client code everywhere, you can move to HTTPS: without having to change all the http: links. That change which will never happen in practice in a real site like w3.org -- there are too many millions of internal and incoming links, many hard coded into software, legal findings, etc, etc.

But HSTS is domain-name-wide, not path-wide.  If you put an HSTS header on any page on the site, it  affects the browser behavior for the whole site, as per the spec.

(((I have tried it as an experiment, to put an HSTS header on a random page on w3.org. It causes clients who access that page to automatically redirect from http: to https: but there are parts of the site which currently server-side redirect back the other way, and so anyone whose browser has ever seen that header is forced into a fast redirect loop which not only blocks them getting at any of the w3.org site, it soon puts them, and in some cases their entire company, on the DoS blacklist from which they have to apply personally to be removed.  And it is not obvious or simple for a user how to clear the HSTS cache. )))

> however. I'd love to see redirects from HTTP to HTTPS for `www.w3.org/TR/*`, for instance. I'm hopeful that https://w3c.github.io/webappsec/specs/upgrade/ will help there.

Good things about Upgrade:

- It happens without any HTTP request being made
- It allows internal links from a given web page to be upgraded

Bad things:

- It only affects links FROM a given single web page, it seems.
- It doesn't help with incoming links FROM other sites, databases, calendar clients, curl scripts, etc etc etc !!!
- It affects links TO a whole domain name origin with no smaller granularity.  Alright for small sites which can switch all at once, but not for complex sites.

This isn't powerful enough at it stands

Suppose instead the site could just put a finer grained HSTS on those ares where automatic upgrade is requested. Then the web page would not need the CSP upgrade-insecure-requests because it wold know already which bits of example.com to upgrade automatically.  The FETCH/CORS/MIX algorithm would be tweaked so that the fetch was not blocked.  If there were http: links to other sites, it could do a HTTPS pre-flight to those sites to pick up their  HSTS maps if they have them, and end up using the HTTPS protocol for all of the http: links it is following.  

You end up with a concept of a "secure mode"  of any HTTP client (including curl) in which case it will try to use TLS and get back a header which indeed confirms the TLS and non-TLS versions are the same.

In general, it continues the myth of the URL being an "address" which can be an secure one or an insecure one. "Authors should be able to ensure that all internal links correctly send users to the site’s secure address, and not to its pre-migration insecure address. ".    Prefer to push that the URI is an identifier of the resource, and the goal is to fetch the resource securely irrelevant of any 's' which may have gotten into it.   That is the key to upgrading the security without breaking links.  The goal is to have TLS everywhere, not "https" in URLs.


Tim


> 
> -mike
Received on Monday, 16 March 2015 18:22:59 UTC