Re: On the harm of adding new methods from Roy T. Fielding on 1998-06-11 (ietf-http-ext@w3.org from April to June 1998)

From: Roy T. Fielding <fielding@kiwi.ics.uci.edu>
Date: Wed, 10 Jun 1998 18:29:05 -0700
To: Larry Masinter <masinter@parc.xerox.com>
cc: ipp@pwg.org, ietf-http-ext@w3.org
Message-ID: <9806101829.aa17430@paris.ics.uci.edu>
I disagree with some of Larry's points.  The design of HTTP was intended
for method extension whenever there is a standardizable semantics
that can be shared between client and server (and, most importantly,
between those agents and any intermediaries that may be between them).

>There are two functions of a proxy: filtering and forwarding.
>A filter decides whether or not to accept a request, and a forwarder
>actually forwards the request and processes the result; forwarding
>implements caching, protocol translation, tunneling, while filtering
>is generally a binary "allowed" or "not allowed". There are some
>forwarders that do rewriting in lieu of filtering (allow but modify).

And also those that do transformation and forwarding, though that's
not an issue here.  Basically, there exist active and passive
intermediaries.

>Forwarders MUST actually understand methods, because -- unfortunately --
>the meaning of HTTP headers and responses differ based on the method
>of the request (e.g., Content-Length for HEAD vs GET). Many forwarding
>systems will not accept new methods gracefully. 

Actually, that is only true for HEAD and GET -- all header fields have
the same meaning for all other methods.  HEAD is the only method for
which Content-Length has a different meaning, and no new method can change
the message length calculation.  GET/HEAD is the only method for which
conditional request header fields have the 304 meaning, whereas for all
other methods they have the 412 meaning.  I can't think of any other
exceptions at the moment -- if any have been added in the past year
or so, they need to be removed.

HTTP/1.1 is designed to enable forwarding without understanding the
method semantics, provided that such forwarding fits within the security
policy set at the intermediary.

>Any new METHOD in HTTP is a serious modification to the protocol, because
>the forwarding function must be aware of it. A new content-type, however,
>can be as easily recognized in the filter layer as a new method, but
>requires NO changes to the forwarding function. Many filters already filter
>on content-type anyway.

On the contrary, the forwarding function is based on the URI, not on
the method or media type.  The filtering function (or, more accurately,
the routing function) is based on everything in the request.  What you
are saying is that it is better for filtering to eliminate one of the
criteria by which an intermediary can do filtering, in this case the
one that is easiest to find and interpret quickly in an HTTP request.
I think that is a bad design when there are semantics that can be easily
distinguished via the method.

>In the case of IPP, it is perfectly adequate to filter on content-type,
>since all IPP content is carried in application/IPP. The arguments for 
>adding a new method (that it is somehow 'easier' to filter on the first
>few bytes of the protocol) are specious because most filters that are
>looking at the protocol at all are looking at content-type. So the
>"firewall filtering" rationale just doesn't hold as a reason for adding
>new methods.

Well, it seems I disagree with everyone on this part.  There is nothing
special about Internet printing that requires independent firewall
semantics.  Nothing.  A printer is a network resource that must be
protected like all other network resources -- as a resource.  There is
no difference between sending a POST full of data to an HTTP server
acting as a printer gateway and sending a POST full of data to the
printer directly.  Treating the two as being different violates one
of the basic principles of the Web architecture.

That does not mean we should add a PRINT method to the protocol.
PRINT does not say anything semantically interesting.  Why should the
client care what mechanism is being used behind the curtains?  Should the
semantics need to change if the server is actually a pipeline like

     client ----  proxy  -----  fax  -----  fax  ---- printer

If you look at any modern operating system design, you will find such
differences abstracted away so that every application does not need
separate interface protocols for every device type.  Why should the
Internet be any different?

RENDER would be a more semantically meaningful choice, since what the
client is saying is that it wants the service to render the data as
specified and then discard it.  However, the reason for defining this
new method has nothing to do with firewall filtering.

The IPP design is poor because it conflates an intended action with
the transfer syntax of the data, thus reducing the normal mechanisms of
allowing selective access to a printer's resources using any of the
independently defined security mechanisms of HTTP.  While it may be
nice to think of Internet printing as a layered protocol on top of HTTP,
the result is something that is neither efficient nor capable of reusing
many of the advantages of HTTP.  Instead, printing should have been
designed as a service with a defined resource model; standard Web agents
could then manipulate that resource model using the same protocols
as everyone else's resources.  For those cases where data and control
information is to be sent in one action, an application-independent
transfer syntax should be used to group them (a la multipart/related).
Of course, there is nothing to prevent such an implementation from
coexisting with IPP, so I am not suggesting that IPP be changed at this time.

Attempting to isolate printer services from other services using any of
the options suggested (new URI scheme, separate port, new method, obscure
media type) is ultimately futile.  None of these provide anything useful
in the way of securing access to resources; the first two in particular
are a total waste of time since the "http" URI scheme is port-independent.
The only thing you accomplish is making the implementations more complicated.
Control of network resource access is already provided at the URI level
and in the underlying protocol layers upon which the HTTP communication
takes place.

 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92697-3425    fax:+1(949)824-1715
    http://www.ics.uci.edu/~fielding/
Received on Wednesday, 10 June 1998 21:36:59 UTC