[schemeProtocols-49] New issue on relationship of URI schemes to protocols and operations

This note is to announce the opening of a new Tag issue which will be 
known as schemeProtocols-49. 

Scope
--------

The TAG believes that there is confusion in the user community, and maybe 
or maybe not ambiguity in the underlying RFCs and other architecture 
documents on several related questions including:

* What in general is the relationship of URI schemes to the protocols used 
to move information through the Web?  We know that it's not a 1:1 
relationship because, URI's employing schemes such as ftp: can be used in 
the Request-URI of an HTTP protocol message (for convenience, I'm 
including the ":" with scheme names to make clearer when I'm referring to 
schemes and when to protocols.)

* Why then is the HTTP protocol specified along with the http: scheme in 
RFC 2616 [1]?

* User agents such as browsers typically use the URI scheme as input to a 
heuristic that selects a protocol and an operation (e.g. GET) to be used 
as a default action for manipulating a resource.  What in the architecture 
of the Web licenses the use of such heuristics?  How should they be 
applied, etc.?

* Is it appropriate to infer the set of operations (e.g. GET, POST) 
available on a resource by inspection of its URI, and in particular the 
scheme name?  Can these evolve over time, e.g. when a protocol 
specification is enhanced, and if so how do you know whether new 
operations are available on old resources?  Does this mean that a resource 
must take on a new name when the operations available on it diverge from 
those available on similarly named resources (e.g. your resource is named 
with http:, but you want to support new protocols and/or new operations 
different from those specified for the then-current HTTP protocol)?

* If the operations can be inferred from the scheme name, can the form of 
representations also be inferred?  Is it the scheme specification or the 
protocol specification that determines the form (e.g. octet stream) and 
typing mechanisms (e.g. media types) to be used for representations of the 
resource?

* Given the above questions, what MAY/MUST/SHOULD one specify when 
preparing the definition of a URI scheme?  For example, what must or may 
be said about the use of particular protocols, either as a basis for 
assigning names or as a basis for retrieving resource representations? 

This issue calls for the TAG to consider these and related questions, to 
help publicize good practice in the areas where the architecture is 
already clear and self-consistent, and to suggest architectural 
improvements in any areas where there may be problems or ambiguities.

Issue background and related work
---------------------------------

This subject was proposed as a TAG issue in a note from me in Feb. 2005. 
[2] The issue was  accepted by the TAG during its telcon of Feb. 8, 2005 
(formal minutes not yet accepted by the TAG, but draft available at [3].) 
Though not explicitly discussed during our telcon, RFC 2718 does state 
[4]:

"2.2.2 URL schemes associated with network protocols

      Most new URL schemes are associated with network resources that
      have one or several network protocols that can access them.  The
      'ftp', 'news', and 'http' schemes are of this nature.  For such
      schemes, the specification should completely describe how URLs are
      translated into protocol actions in sufficient detail to make the
      access of the network resource unambiguous.  If an implementation
      of the URL scheme requires some configuration, the configuration
      elements must be clearly identified.  (For example, the 'news'
      scheme, if implemented using NTTP, requires configuration of the
      NTTP server.)

2.2.3 Definition of non-protocol URL schemes

      In some cases, URL schemes do not have particular network
      protocols associated with them, because their use is limited to
      contexts where the access method is understood.  This is the case,
      for example, with the "cid" and "mid" URL schemes.  For these URL
      schemes, the specification should describe the notation of the
      scheme and a complete mapping of the locator from its source."

During discussion it was noted that an Internet Draft "Guidelines and 
Registration Procedures for new URI Schemes" is being prepared to further 
clarify some of these areas. [5] 

The TAG will consider all of the above in deciding whether there is 
further work or clarification that would be useful.

Next Steps
----------

I have been tasked by the TAG with drafting the skeleton of a potential 
finding in these areas.  The initial cut is more likely to be useful for 
promoting discussion than as a near-final embodiment of a TAG position. In 
any case, such a draft is also more likely to appear in a few weeks than 
in a few days.  Input and discussion is most welcome in the meantime.

The issues list [6] will be updated to reflect this new issue as soon as 
some clerical issues regarding maintenance of the list are resolved. 
(Observant readers of this list will note that there is at the moment no 
issue XXXX-48; expect announcement of another issue from Norm shortly.)

Noah


[1] http://www.ietf.org/rfc/rfc2616.txt
[2] http://lists.w3.org/Archives/Public/www-tag/2005Feb/0013.html
[3] 
http://lists.w3.org/Archives/Public/www-tag/2005Mar/att-0038/08-tagmem-minutes.html
[4] http://www.faqs.org/rfcs/rfc2718.html
[5] http://larry.masinter.net/draft-hansen-2717bis-2718bis-uri-guidelines-03.html
[6] http://www.w3.org/2001/tag/issues.html

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Tuesday, 15 March 2005 02:50:50 UTC