Re: fyi: should URIs convey protocol/service layering? from Roy T. Fielding on 2001-06-16 (uri@w3.org from June 2001)

From: Roy T. Fielding <fielding@ebuilt.com>
Date: Fri, 15 Jun 2001 17:56:39 -0700
To: Dan Connolly <connolly@w3.org>
Cc: uri@w3.org
Message-ID: <20010615175639.C920@waka.ebuilt.net>
There are a couple separate issues.  First, it is definitely preferable
that the identifier contain only that information necessary to identify the
naming authority and the name within that authority.  However, it is also
true that every accessible resource needs at least one identifier that 
is not generic in nature, providing sufficient information to act
as an address for accessibility.   This specific identifier might only be
published by redirection from a less direct identifier, but it needs to be
possible to publish it as a URI, which means there needs to be a scheme
to support it.

Second, the identifier does not identify which protocols are to be used
by the client to access a resource.  What it does do is identify which
protocols are necessary in order to communicate with the naming authority
in order to resolve the name into an authoritative resource that can be
accessed.  Thus, an HTTP client may use an HTTP proxy in order to make
requests using a wais URL, but at some point in a non-cached request chain,
a client will use WAIS over TCP (as defined by the wais URL scheme) to
contact the naming authority.

So, regarding the issue at hand, there are only three possibilities:

   1) the mechanism of contacting the naming authority is defined by
      the URL scheme, as is the case for http (HTTP via TCP) and https;

   2) the mechanism is encoded as part of the authority component, just
      as the TCP port number is defined in many schemes; or,

   3) the protocol elements that use URI are modified to allow references
      to be composed of multiple URI, as in

        ssl://example.com(http://example.com/foo)

Historically, (1) is the only option that has ever been implemented.
(3) cannot be deployed without an Internet flag day.  (2) is possible,
but will require changes to deployed parsers and RFC 2396.  (1) is therefore
my recommendation because it is easier to implement and safer to deploy.
This is because a proper URI manipulating application will already be
architected to handle different URI using a table of scheme-specific
handlers, and we simply get better code reuse characteristics (and fewer
unanticipated errors) if a new scheme is defined for each type of naming
authority.

On Fri, Jun 15, 2001 at 11:41:21AM -0500, Dan Connolly wrote:
> Jeff.Hodges@kingsmountain.com wrote:
> > 
> > This is a couple'o (key) messages from a thread over on the
> > bxxpwg@invisible.net list that may be of interest to folks here.
> 
> Thanks for bringing it up here; this is an issue folks
> have been noodling on for years:
> 
>   [[[
>   Short UDIs
> 
>   UDIs should be kept short and devoid of information that
>   indicates the mechanism by which the document is retrieved.
>   (in the theoretically clean implementation, the protocol
>   information should not be present).
>   ]]]
> 
>   --        DosDonts -- /DesignIssues
>   TimBL, ~1990
>   http://www.w3.org/DesignIssues/DosDonts.html

There is no question that this is the ideal, for the same reason that many
people seek URNs as the better means of identifying long-lived resources,
but it is not sufficient to build a working system unless there is some
other identifier or implied context that does provide that information.

> See also:
> 
>   Decoupling the URL Scheme from the Transport Protocol 
>   http://www.ansa.co.uk/ANSA/ISF/decoupling.html
> 	crud... 404 ... google search... ah:
>  
> http://www.ansa.co.uk/ANSATech/ANSAhtml/95-97-websites/ISF/decoupling.html
> 30 Nov 1995; <rtor@ansa.co.uk>

That is just mentioning the difference between transport protocol and
naming authority, which I hopefully described above a little clearer.

> But folks have mostly avoided the practical side of
> this issue by layering everything on top of http;
> a noteable exception is https:, where I wish
> we would have avoided putting the "secure" flag
> in the name.

But we could not have avoided that.  The security context must be established
before the client uses the URL in any other way, which meant that an http URL
cannot be used because of the risk of older browsers mistaking it for a normal
request without secure communication.  An alternative could have been to use
a compound URL, but like the "ssl" example above the result would be a lot
of redundant information.  The only thing "saved" by doing so is the cost of
adding a URL scheme to the namespace, which has proven to be of zero cost
for those applications that are implemented in accordance with the WWW
architecture, which expects the URI scheme namespace to be extensible.
In fact, the advantage of using https over http.ssl is that we can now
negotiate the security context rather than assume it is SSLv1.

In any case, http and https define two entirely different naming authorities,
even when their implementations reside on the same machine.  It therefore
makes sense that they be different schemes.

....Roy
Received on Friday, 15 June 2001 21:00:02 UTC