Re: URI Opacity Principle (was: Re: use of fragments as names is irresponsible) from noah_mendelsohn@us.ibm.com on 2003-01-16 (www-tag@w3.org from January 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 16 Jan 2003 13:21:57 -0500
To: "Mark Nottingham" <mnot@mnot.net>
Cc: fielding@apache.org, sandro@w3.org, www-tag@w3.org
Message-ID: <OF93796DC7.592BD1B6-ON85256CB0.0064063B@lotus.com>
Mark Nottingham writes:

> 
> > Do answers to these follow from your proposal?
> 
> 
> hopefully - let's see ;)
> 
> 
> > can/should a client keep histories based on  substructure of URIs?
> 
> I think this is actually a UI feature, not a URI
> manipulation; they're treating them as a pool of opaque
> (aha!) strings, sorted by alpha and length, and
> returning those that match as you type in a string.

Not sure I agree.  Given previous references to:

        http://example.com/link1
        http://w3.org/linkx
        http://example.com/link2
        http://w3.org/linky

Most history lists say things like:


        Example.com
                Title of link1
        `       Title of link2
        w3.org
                Title of linkx
                Title of linky

That's much more than a sort on an opaque string. It
depends on knowing that the DNS name is a distinguished
part of an HTTP URI.

Similarly the type-ahead in IE knows about the /
separator and fills in things one token at a
time. Again, in some sense not opaque.  OK per
web architecture or not?  After all, real opacity
would mean that it shouldn't even look at the
substructure, except maybe when actually initiating
an operation such as GET.  As best I can tell, 
IE checks for the http: scheme and "special-cases"
it in its type-ahead. OK or not?

> > Is it OK for cache proxies to microparse URIs to infer
> > clustering characteristics of the information space?
> 
> That is, use a URI as input to a freshness heuristic?

No, I meant as a locality heuristic.  For example, my cache 
will retain only content with a URI that matches, e.g.

        http//example.com/*

In other words, my cache will retain representations
only of resources appearing to originate from
example.com. Appropriate use of URI or not?

How about:

        http//example.com/x/*

a subresource of the hierarchical (in RFC 2396 terms)
resource at example.com?

> It's allowed by HTTP, and sometimes used, but my
> experience is that it's poor practice, and repeatedly
> recommend against it (But I have a general bias against
> heuristics in these situations). The caching industry
> has generally migrated away from these solutions,
> especially those surrounding prefetching (although
> there is a certain fascination with it in academic
> circles).
> 
> From a URI perspective, I think it's not OK (and falls
> under my SHOULD NOT).

Does that answer apply to my clarification?
 
> > Is it (or more correctly "why is it") OK for a client
> > to actually inspect the scheme to determine a
> > retrieval strategy?
> 
> Yes, because that's part of the generic dereferencing
> process, which is part of my first paragraph (although
> it should probably be said a bit more clearly that
> dereferencing is a special operation).
> 
> 
> > Surely it is appropriate for the server to map the HTTP example above 
to
> > file system sub-directories should it choose to do so? (Though of
> course,
> > that's not required or visible from the outside.)
> 
> Yes, because it is the authority that minted the URI
> that's doing the mapping.

OK.  On both use of scheme to decide a dereference
strategy, and use of URI substructure at the host
supporting the resource, I think we need a crisp
statement that says what's going on.  I'm specifically
curious about the use of an http URI for a resource
that is not in fact hosted by an http-based server.  I
suspect the answer is along the lines of:

"A client or other agent MAY determine the schema from
a URI and MAY at any time for for any reason attempt
operations defined by the scheme.  For example, a
client MAY attempt a GET or POST on a URI using the
http: or https: scheme.  If such operations succeed,
then the client MUST assume that any retrieved
representations or other results were indeed a result
of successful access to the named resource (or
redirections of it.)

Accordingly, it is the responsibility of any person or
software assigning a URI name to a resource to ensure
that operations performed using the scheme used will
either successfully access the correct resource, or
will fail.  For example, in using the URI
mailto:noah_mendesohn@us.ibm.com to identify my mail
drop, it's not enough for me to ensure that my employer
controls the us.ibm.com domain, and that the URI is
otherwise unused.  I must ensure that if someone
actually sends mail using this URI, that it will indeed
go to the intended resource (my mail drop), or that the
mail will bounce.  If the mail might go to someone else,
we've got a problem."

I may not have the rule exactly right, but it's exactly
the sort of thing I would expect to see spelled out at
this level of detail, probably in the arch document.

Thanks!

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Thursday, 16 January 2003 13:23:14 UTC