Re: URI Opacity Principle (was: Re: use of fragments as names is irresponsible) from Mark Nottingham on 2003-01-17 (www-tag@w3.org from January 2003)

From: Mark Nottingham <mnot@mnot.net>
Date: Thu, 16 Jan 2003 23:49:12 -0800
To: <noah_mendelsohn@us.ibm.com>
Cc: <fielding@apache.org>, <sandro@w3.org>, <www-tag@w3.org>
Message-ID: <016001c2bdfd$546843d0$770ba8c0@mnotlaptop>
> That's much more than a sort on an opaque string. It
> depends on knowing that the DNS name is a distinguished
> part of an HTTP URI.

I think that's OK; the authority *is* a distinguished part of HTTP (and
other) URIs. In a sense, the browser is saying "if I *were* to dereference
the URIs here, I'd be looking at this information, when doing so, and that
might be interesting to you."


> Similarly the type-ahead in IE knows about the /
> separator and fills in things one token at a
> time. Again, in some sense not opaque.  OK per
> web architecture or not?  After all, real opacity
> would mean that it shouldn't even look at the
> substructure, except maybe when actually initiating
> an operation such as GET.  As best I can tell,
> IE checks for the http: scheme and "special-cases"
> it in its type-ahead. OK or not?

Hmm, I can't get IE6 to do that. *shrug* Without seeing the behaviour,
this seems a little fuzzier. Are you saying that if the browser has ONLY
the URI 'http://www.example.com/foo/bar/baz/' in its history, and I type
in 'http://www.example.com/', it'll "helpfully" auto-complete 'foo' (not
'foo/bar/baz/')?

That seems to exploit the containment relationship between resources
identified by various parts of the path components; the only "official"
use of the path components is the construction and resolution of relative
URIs, and I don't think that's happening here. There isn't any guarantee
that dereferencing the resources on the way to /foo/bar/baz/ will actually
successfully return representations, but it's generally good practice for
sites to do so; hence this functionality. So, I personally don't think
this is OK, but it usually works, and doesn't break too horribly (you'll
just get a 404, etc., because GET is safe). There are lots more broken
things in browsers...

BTW, another twist on this question comes up in Apple's new Safari browser
(http://www.apple.com/safari); Steve Jobs made a big deal about "Snapback
technology" which apparently either takes you to the "home page" of a site
(shoots straight up the path heirarchy to the root resource) or uses some
heuristic and the history to return you to an index page (I haven't played
with it extensively).


> No, I meant as a locality heuristic.  For example, my cache
> will retain only content with a URI that matches, e.g.
>
>         http//example.com/*
>
> In other words, my cache will retain representations
> only of resources appearing to originate from
> example.com. Appropriate use of URI or not?
>
> How about:
>
>         http//example.com/x/*
>
> a subresource of the hierarchical (in RFC 2396 terms)
> resource at example.com?

Without authoritative information as to the structure of the resources,
you're hoping that the heuristic you use will capture all of the
representations that you need. Although it's reasonable to exploit the
hierarchy implied by the path components, there's no guarantee that the
resources are organized as you have organized them in your head. It's the
end user's choice as to whether to use this.


> "A client or other agent MAY determine the schema from
> a URI and MAY at any time for for any reason attempt
> operations defined by the scheme.  For example, a
> client MAY attempt a GET or POST on a URI using the
> http: or https: scheme.  If such operations succeed,
> then the client MUST assume that any retrieved
> representations or other results were indeed a result
> of successful access to the named resource (or
> redirections of it.)

Sounds good. Unfortunately, there's an effort in the IETF
(http://www.ietf-opes.org/) that makes that last assumption potentially
false.


> Accordingly, it is the responsibility of any person or
> software assigning a URI name to a resource to ensure
> that operations performed using the scheme used will
> either successfully access the correct resource, or
> will fail.  For example, in using the URI
> mailto:noah_mendesohn@us.ibm.com to identify my mail
> drop, it's not enough for me to ensure that my employer
> controls the us.ibm.com domain, and that the URI is
> otherwise unused.  I must ensure that if someone
> actually sends mail using this URI, that it will indeed
> go to the intended resource (my mail drop), or that the
> mail will bounce.  If the mail might go to someone else,
> we've got a problem."
>
> I may not have the rule exactly right, but it's exactly
> the sort of thing I would expect to see spelled out at
> this level of detail, probably in the arch document.

Sounds like a very good start. I'd emphasise the software's and protocol's
roles more in the last paragraph.

Cheers,
Received on Friday, 17 January 2003 02:52:14 UTC