Re: [whatwg/url] Opaque hosts: realistic examples and origins? (Issue #690)

Origins are kind of interesting, and I've been meaning to bring up this whole idea of opaque origins for a while. I did a bit of research on it because the current specification doesn't make a lot of sense and seems to miss some trends which could simplify the model.

> I guess it's a question about whether origins are a URL concept or a HTML/Fetch concept, which has always been a bit fuzzy

I wish I could give some citations, but it's been a while and I can't find everything right now. Basically, the key impression I was left with is that an origin is **not** a property _of a URL_ (like, say, the `scheme`, `path`, and `query` are); it is an abstract domain of trust, which can sometimes be computed _from a URL_ based on what you know about how that URL will be processed.

If the URL has a scheme known to the standard (like HTTP), then we know the host is a domain/IP address, and we know that the host establishes an authority context over the resources accessible via different paths and queries, which are the salient properties of HTTP-family URLs. So there is an obvious domain of trust we can establish, just by looking at the address of the resource and nothing else.

For things like `file:` URLs, the way in which the address maps to an authority context is less obvious. You could imagine that user-defined subtrees of the filesystem might be isolated from one another in some fashion, so my top-secret strategy documents can include resources (including executable code/scripts) from other local files, and they will belong to different domains of trust and be isolated from each other as cross-origin resources typically are.

> One resolution would be to say that anything which wants to use opaque-host URLs will need to use its own non-URL-Standard origin computation, but that is a bit subpar, as for example it would lead to (new URL("isolated-app://app-id/")).origin differing in value between environments.

Along the same lines as I was mentioning above - for URLs not known to the standard, there is no way to reasonably establish a guaranteed realm of trust that is both permissive enough to be useful and strict enough to be secure, and which works for every possible URL scheme. For those situations, applications must invent their own origin-like abstractions based on their knowledge of the schemes they work with. This is what I advise in Swift, using our `enum`-with-payload feature to model a discriminated union:

```swift
enum SecurityDomain {

  /// Security domain is 'obvious' due to URL scheme
  /// being known by the standard.
  case derivedFromURL(WebURL.Origin)

  /// A security domain which has been established
  /// by application-specific logic.
  case applicationDefined(MyApp.RealmOfTrust)

  /// Opaque origin, unable to determine a security domain.
  /// These must be maximally isolated from each other.
  case undefinedOpaque

  /// Checks whether two security domains are considered equivalent.
  static func == (lhs: Self, rhs: Self) -> Bool {
    switch (lhs, rhs) {
    case (.derivedFromURL(let lhsOrigin), .derivedFromURL(let rhsOrigin)):
      return lhsOrigin == rhsOrigin
    case (.applicationDefined(let lhsRealm), .applicationDefined(let rhsRealm)):
      return lhsRealm == rhsRealm
    default:
      return false
    }
  }
}
```

I actually think that HTML should do the same thing. But to explain why, a bit of background:

Currently, the definition of opaque origins means that calculating an opaque origin is the same as creating a new origin. This leads to URL libraries representing opaque origins as UUIDs/nonces/atomic counters internally:

- [`servo/rust-url`](https://github.com/servo/rust-url/blob/48fcbe1c543a8350a74b226c17c1ec06c6e19a68/url/src/origin.rs#L64-L65)
- [`chromium/GURL`](https://github.com/chromium/chromium/blob/fd8a8914ca0183f0add65ae55f04e287543c7d4a/url/origin.h#L91-L97)
- I think [gecko does something similar](https://github.com/mozilla/gecko-dev/blob/cefe3965912a9fd75e0f7d39642dddf1e52e2142/netwerk/base/mozurl/src/lib.rs#L307), although I'm not an expert at that codebase (or the others, to be fair).

It has this really weird property that if you calculate a URL's origin, that particular local variable in code will compare as same-origin with itself, but calculating the URL's origin _again_ produces a different opaque origin. It doesn't really make any sense - if an origin has to do with a resource's _security context_, why does it matter when I calculate it?

Why is calculating the domain of trust the same as creating a new domain of trust?

Really, if we want to express that these are _undefined_ security domains and should be maximally-isolated, they should behave like floating point `NaN` values, where `x == x` can return `false`, even for a local variable. They should not be same-origin with anything at all.

But this would break HTML, which actually relies on this quirk. It calculates opaque origins at particular times, saying things like:

> Let navigationParams be a new navigation params whose request is null, response is null, **origin is a new opaque origin**, ...

And then it will rely on those details later by saying:

> Two origins, A and B, are said to be same origin if the following algorithm returns true:
> - If A and B **are the same opaque origin**, then return true.

Really, when HTML deliberately calculates new opaque origins, it is establishing new domains of trust for a particular operation or browsing context, and it should be more explicit about that, instead of making it part of the opaque origin concept.

I think that would actually help opaque origins become more useful, and give a firmer grounding to application/context-specific security domains which can be computed (perhaps only partially) from a URL but are not defined by the URL standard.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/690#issuecomment-1105818099
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/690/1105818099@github.com>

Received on Thursday, 21 April 2022 22:20:25 UTC