This document is also available in these non-normative formats: XML.
Copyright © 2005 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
This document discusses the deployment of protocols and URI schemes for use on the World Wide Web. Implications for correct configuration of user agents, servers and proxy gateways are discussed. Guidelines are also provided for deciding whether a new protocol or scheme is merited, and for maximizing interoperability of new protocols with those that are already widely deployed.
This document is an editors' copy that has no official standing.
This document has been produced by the W3C Technical Architecture Group (TAG). This finding addresses TAG schemeProtocols-49.
This version of the document is a very preliminary sketch of a possible finding. Essentially, it is a snapshot of the editor's work-in-progress, made available so that the TAG will have something to discuss informally at the 14 June 2005 Face to Face meeting.
This document builds on and complements information from [RFC 2717] and [RFC 2718]. At the time of this writing, an Internet Draft has been submitted that would revise and subsume both of those RFCs (see [RFC 2717bis]). Although such drafts are not suitable for normative reference, this finding is intended to be consistent with the directions signalled in those revisions. When and if a revision to the RFCs becomes accepted, the TAG intends to republish this finding with the appropriate references and with any necessary changes to content.
Editorial note | |
Is this the right way to handle the reference to 2718bis2718bis? |
Additional TAG findings, both accepted and in draft state, may also be available. The TAG may incorporate this and other findings into future versions of the [AWWW].
The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC 2119].
Editorial note | |
Should a finding like this actually make use of the formal rfc2119 terminology? |
Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
1 Preface
2 Terminology
3 URI Assignment and Protocols
4 Gateway Proxies
5 Selection of Protocols by User Agents
6 Protocol design: consistency of operations and formats (To Be Supplied)
7 References
Of the many ways in which the Web can be extended, the provision for new URI schemes may be the most fundamental. URI schemes are often created to signal the use of new protocols for transfer of resource-related information. Embodied in such protocols are operations, such as GET, POST and DELETE in the case of HTTP, which determine the sorts of interactions that are possible with a given resource. The operations in turn determine the format and typing of information exchanged on the Web.
Precisely because so many aspects of resource naming and interaction are subject to change through introduction of URI schemes, new schemes can undermine the interoperability of the Web. [AWWW] explains why unnecessary proliferation of URI schemes must be avoided: "While Web architecture allows the definition of new schemes, introducing a new scheme is costly. Many aspects of URI processing are scheme-dependent, and a large amount of deployed software already processes URIs of well-known schemes. Introducing a new URI scheme requires the development and deployment not only of client software to handle the scheme, but also of ancillary agents such as gateways, proxies, and caches. See [RFC 2718] for other considerations and costs related to URI scheme design...Because of these costs, if a URI scheme exists that meets the needs of an application, designers should use it rather than invent one."
Conversely, the introduction of new protocols, operations, and data formats may sometimes be essential if the Web is to continue as a universal information space integrating the broadest possible range of resources. Already a wide variety of information is being shared through peer-to-peer protocols, many of which are not well integrated on the Web, and there is reason to believe that the highest quality multimedia feeds may not be best distributed and controlled through HTTP. Immersive user interfaces may require interaction protocols which are more flexible than those in in widespread use on the Web today, and so on. For such reasons, it is important to explore the tradeoffs involved in deploying new schemes, protocols and operations. This finding attempts to provide useful guidelines for the introduction and use of URI schemes and their associated protocols.
[RFC 2718] sets out guidelines and caveats for the creation of new URI schemes. This finding provides complementary information relating to the deployment of resources using such schemes, the design of protocols, the choice of operations to be supported, and the suggested behaviour of user agents. Indeed, this finding for the most part avoids restating principles already covered in [RFC 2717] and [RFC 2718]; readers are encouraged to become familiar with them before proceeding.
This finding is intended to cover a broad range of information formats and protocols including client/server, peer-to-peer, streaming multimedia, multicast, etc. For convenience, the following client/server-oriented terminology is used except in cases where the extension to other protocols is unlikely to be clear:
For schemes such as http and ftp, the association of a URI to a resource is defined in terms of the
corresponding protocol.
Thus, the resource identified by http://example.org/resource1
is by definition the
one for which representations are returned (GET) or updated (PUT) when that URI is supplied as the
HTTP Request-URI
(see [RFC 2616]).
Unless otherwise stated, this finding deals only with such protocol-associated URI schemes.
Subtleties arise when such URIs are employed without deploying a server for the resource. For example, it is common to use XML namespace names based on the http scheme even when no server is providing representations for that namespace. Deploying such a server is desirable, but is not required by Web architecture. When there is no such server, the URI chosen SHOULD be consistent with eventual server deployment. So, in the case of HTTP, it is inappropriate to base a URI on a DNS name that is not registered, because the DNS name might later be assigned to an organization that would use it for a purpose inconsistent with serving representations of the resource. Similar considerations apply for other schemes and their associated protocols.
In the simplest case, the protocol associated with the URI directly connects the user agent to the resource provider.
The Web also allows for gateway proxies, which convert from one protocol to another. In such cases, the server offers the resource using one protocol, but the user agent access it through another. For example, HTTP can be used as a proxy for the FTP protocol.
The following considerations apply to the implementation of gateway proxies:
URI references provided by the user agent MUST be translated into appropriate addresses or other resource indicators employed by the server. Note that the server protocol may but need not be Web-based, and may but need not use URI-addressing. In the example above, the FTP protocol predates the Web and does not use URIs in messages exchanged between the gateway and the resource server.
Operations requested by the user agent MUST be translated into suitable equivalents in the server protocol. In the example above, HTTP GETs presumably result in the FTP operations that open a connection, set an appropriate transfer mode (most likely binary), change to the appropriate server directory, and retrieve the appropriate file. The result of that retrieval must then be assigned a suitable MIME type and returned as an octet stream in the HTTP Response. The Web itself has no fixed standards for the degree of fidelity that is appropriate for any given gateway or pair of protocols, but the specifications for particular protocols or URI schemes may impose such constraints. For example, a gateway using HTTP as a proxy protocol must never map an HTTP GET into an unsafe operation on the resource. Gateways are not possible in cases where the mismatch between the operations supported is too great.
Similarly, the gateway is responsible for mapping the format and typing of information exchanged. Again, the Web architecture imposes no fixed standards as to the level of fidelity that is appropriate, but particular scheme and protocol specifications may. In the example above, the gateway is responsible for using heuristics or other means to assign media types to representations of FTP-based resources.
On the Web, URI names are typically used "on the wire" as the means
by which protocols identify resources to be accessed.
HTTP, for
example, uses a URI as its Request-URI
.
When one such protocol is used as a proxy to another,
the two "hops" may or may not use the same URI.
When they are not the same, then there are two URI's identifying the same resource. (Strictly speaking, the two URI's name the resource and the proxy of the resource respectively, but for many practical purposes the effect is similar to having two names for the resource itself.)
[AWWW] explains the disadvantages of assigning more than one URI to a single resource.
For those reasons, protocols intended for use with gateways SHOULD
be designed to avoid
the requirement to generate such duplicate URI names.
HTTP, for example, provides for the use of non-http scheme URIs as Request-URIs
; accordingly, the same URI can often be used on both "hops".
Conversely, URI duplication may be unavoidable when the gateway protocol
demands naming with a particular scheme.
Editorial note | |
Does this section us an appropriate mix of RFC 2119 "MUST"s and "SHOULD"s vs. more informal guidance? |
This section discusses the means by which a user agent can select an appropriate protocol for accessing a resource.
The specification for a URI scheme determines the normative association of URIs from that scheme to resources. For protocol-based URIs, that association is typically defined in terms of the protocol (see 3 URI Assignment and Protocols). In such cases, a user agent can determine a protocol based on inspection of the URI, and in the common case where there is one protocol associated with a scheme, the scheme name directly determines the protocol. It is, for example, always acceptable for a user agent to attempt an HTTP connection to a resource named with the http scheme.
The means by which user agents determine that a gateway protocol is to be used are specific to each user agent. Using the example above, a user agent would require some configuration to indicate that ftp-scheme resources were in fact to be accessed using the HTTP protocol. This is similar to the other sorts of proxy configuration that are commonly required of Web browsers.
This section has not been written. The paragraphs below are placeholders with reminders of possible topics to be covered.
To be supplied: explain that it's much easier to support new protocols in a user agent if the operations of that protocol are even generally similar to those of HTTP or other widely deployed protocols. So, a high def streaming video protocol may not support exactly an HTTP "Get", but if it supports something in the same spirit then a browser can probably provide a fairly consistent navigation experience as one goes from a web page to a movie and back.
To be supplied: similarly, if a peer-to-peer protocol supports retrieval of media typed octet streams, then browsers can use existing renderers, caches, etc. This will link to the AWWW GPN on reusing formats.
To be supplied: operations on the wire vs. operations at the endpoint. In HTTP, GET is visible both as a browser operation and on the wire. In peer to peer, you might have a very compatible operation at the browser that turned into all sorts of strange traffic on the wire. That's still a good thing to go for: if you can simulate as much of the HTTP "endpoint API" as possible, then you get a lot of browser compatiblity, even if the on the wire protocols are radically different.