URI Opacity Principle (was: Re: use of fragments as names is irresponsible) from noah_mendelsohn@us.ibm.com on 2003-01-15 (www-tag@w3.org from January 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Tue, 14 Jan 2003 22:37:45 -0500
To: Sandro Hawke <sandro@w3.org>
Cc: "Roy T. Fielding" <fielding@apache.org>, www-tag@w3.org
Message-ID: <OF3AA57614.C33C95B8-ON85256CAF.0011104F@lotus.com>

Roy Fielding writes:

>> Somewhere along the line the W3C got hooked on the 
>> notion that URIs are opaque and hierarchy is 
>> meaningless.  That is bogus, as evidenced
>> by every decent information site on the web today. 

As I think I've suggested once or twice, the TAG would do the community a 
service IMO if it would clarify the degree to which URI's are indeed to be 
viewed as opaque.   When may their substructure be either inspected or 
built up incrementally, and when are they to be treated as "black boxes"?

Tim BL provides one exposition of the opacity principle at [1].  In 
general, I find that many correspondents on this list and others both 
oversimplify and confuse the issues, and one can make the case that the 
principle has not in fact been stated sufficiently clearly (or, per Roy's 
note, perhaps it is at times a false goal).  Surely it is confusing to 
hear on the one hand that URIs are opaque, while on the other RFC 2396 [2] 
goes to some length to provide hierarchical substructure as a special 
case.  There is surely a sense in which:

        uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

is more inherently opaque (assuming the "-"s are truly just for 
readability) than:

        http://example.org/root/sub1/sub2/mydoc.html

If URI's were really intended to be opaque, why not make most every URI a 
uuid?

My impression is that the server implementing an HTTP resource should 
indeed have the ability to process based on the substructure of the URI. 
Surely it is appropriate for the server to map the HTTP example above to 
file system sub-directories should it choose to do so? (Though of course, 
that's not required or visible from the outside.)  Is it OK for a client 
to help you build up the URI incrementally, one piece at a time?  IE does. 
Is it OK to build a query string in a URI from a form?  Pretty surely, and 
Tim says so.  Is it OK for a client history list to gather all the URI's 
that seem to be from example.org and group them?  Common practice suggests 
"yes".  Can proxies cache based on the substructure of the URI?  I would 
think that's desirable (but don't try it with the uuid: scheme).  Is it OK 
to start guessing MIME types of representations from that .html at the 
end?  True believers seem to say no (and I guess I'm one of them).  Is it 
OK to assume that the .html URI above references a Web page as opposed to 
some human being associated with a web page as opposed to some human being 
who has nothing to do with a web page?  Seems to be the fodder for lots of 
rambling on this list.  When is it appropriate for a client or other agent 
to inspect the scheme as a means of determining a retrieval strategy?  I'm 
still somewhat confused as to what folks such as Roy think on this one, 
since we often hear that HTTP: need not identify resources to be retrieved 
with HTTP (or maybe I've misunderstood).

Anyway, I don't claim to have the answers, but it's very much the sort of 
question I would expect the architecture document to help settle.  Just 
saying "URI's should be opaque" seems too simplistic, and thus confusing. 
I'm glad Roy's note has brought it up.  Thank you!

Noah

[1] http://www.w3.org/DesignIssues/Axioms.html#opaque
[2] http://www.ietf.org/rfc/rfc2396.txt

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------

Received on Tuesday, 14 January 2003 22:38:43 UTC