Re: URI Opacity Principle (was: Re: use of fragments as names is irresponsible) from Jeremy Dunck on 2003-01-16 (www-tag@w3.org from January 2003)

From: Jeremy Dunck <ralinon@hotmail.com>
Date: Thu, 16 Jan 2003 11:59:57 -0600
To: tbray@textuality.com, dareo@microsoft.com
Cc: noah_mendelsohn@us.ibm.com, sandro@w3.org, fielding@apache.org, www-tag@w3.org
Message-ID: <BAY1-F221TVSxzW234x0000f0fd@hotmail.com>
>From: Tim Bray <tbray@textuality.com>

>Dare Obasanjo wrote:
>>Hopefully if the architecture document says anything in this direction
>>it will lean away from URI opacity. The opacity of URIs is a problem for
>>in XML application, specifically those that use XML namespaces. A number
>>of issues that come up in XML applications would be a lot easier to
>>solve (e.g. how does one version XML namespaces?) if namespace names
>>were structured and not just glorified UUIDs.
>
>It seems to me that it is *never* appropriate to guess at media-types or 
>semantics by picking apart URIs in a computer program.  It is entirely fair 
>and often desirable to use the design of URIs to convey information about 
>the hierarchical structure of an information space, and to provide a 
>suggestion to the human reader about what might be there. -Tim

I agree that a TAG finding on this topic might settle the debate.

Serenditously, I recently wrote on the [wiki] on this topic.  I've edited 
slightly to make it a bit more appropriate in this context.

The original page is here:
  [DontPutGuidsInUrls]

"
I think it's dangerous for anything but a human to attempt to derive any 
meaning from the text included in an URL.  That is, URLs are supposed to be 
[opaque], and guessing at the details of the resource content based on 
anything in the URL is a Bad Thing.

It's not unusual that a human would want to do exactly that, though.  
Lopping off the end of a URL to find the 'containing page' is a fairly 
normal thing for the adept user to try if he doesn't immediately see the 
link that he wants.  [UseIt article]

Anyway...It's a common problem [ThreeChooseTwo] in application development 
that for any ID to be -only- an ID, it must never be shown to the user.  As 
soon as you show an ID to the user, it can have rules applied to it (like 
"it needs to be short", or "it must be sequential"), and it muddies the 
utility as a programmatic ID.  Of course, never giving the user a mechanism 
to retrieve something is not very useful, either.

Can anyone argue that GUIDs as querystring parameters are not 
_programmatically_ useful?  Can anyone argue that GUIDs in URLs are not a 
bother for the user?

It's an imperfect world we live in...  It seems like there should be a 
pattern to address the problem of how to create loosely coupled IDs. That 
is, how to create a programmatically useful ID, while making a loosely 
matched pseudo-ID which is useful to a human.

Some resolution mechanism would decide (or ask the user for a decision) 
which programmatic ID they meant.
"

This seems the same problem to me.  URIs have been put in the UI, and people 
have made (sometimes incorrect) assumptions about their meanings.
I think if the original assertion was that [xhtml] identifies the namespace 
for XHTML 1.0, then that must always hold.

I just realized that the Rec for XHTML 1.1 doesn't specify a new namespace 
for XHTML 1.1, and that seems like a Bad Thing to me.  The entities in the 
namespace of XHTML 1.1 did change, and I'm not sure I understand how it's OK 
that the namespace identifier didn't change.

Perhaps I made an incorrect assumption that [xhtml] identifies, XHTML 1.0, 
when it actually referrs to the latest line of XHTML?

If a URL was made which identifies the latest version of XHTML, then (maybe) 
it'd be OK to use that URL to locate a different namespace over time, though 
I think that brings some definite instability to document processors over 
time.

I suppose, Dare, you're looking for a rule like "the content after the last 
slash of a namespace URI is the version token"?  That doesn't remove a 
processor from the need to keep track of which tokens it supports.  How is 
that better than just using the whole URL as the version token?

Incidentally, I think this ties with a previous [thread on content 
negotiation], in which I express my feeling that some mechanism for 
describing the resource that a URL identifies (which is not the same as the 
representation returned!) would really help with the problems faced by KR 
and CN right now by allowing a linker (or referencer, in this case) to make 
more informed decisions about which URL they choose to use.

It seems to me that such a mechanism would probably be most appropriate as 
an HTTP header, though I haven't really thought that through.

  -Jeremy

[wiki]
http://www.c2.com/cgi-bin/wiki

[DontPutGuidsInUrls]
http://www.c2.com/cgi-bin/wiki?DontPutGuidsInUrls

[opaque]
http://www.w3.org/DesignIssues/Axioms.html#opaque

[UseIt article]
http://www.useit.com/alertbox/990321.html

[ThreeChooseTwo]
http://www.c2.com/cgi-bin/wiki?ThreeChooseTwo

[xhtml]
http://www.w3.org/1999/xhtml

[thread on content negotiation]
http://lists.w3.org/Archives/Public/www-tag/2003Jan/0047.html

_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail
Received on Thursday, 16 January 2003 13:00:35 UTC