Re: Round 3: moving HTTP 1.0 to informational from Roy T. Fielding on 1996-02-08 (ietf-http-wg@w3.org from January to March 1996)

From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
Date: Thu, 08 Feb 1996 09:00:49 -0800
To: Paul Hoffman <paulh@imc.org>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9602080900.aa02134@paris.ics.uci.edu>
In general, it is difficult to judge changes to the specification,
and deletions in particular, if the changed sections are not provided
as context in the same message.  We will have to do that this week,
so I'll put together a diff of what we have talked about along with
the editorial changes I already made while editing the HTTP/1.1 spec.

However, before that, I do disagree with some of the deletions.

> The following is what I hope is the last round on the HTTP/1.0 draft.
> Everything here *except for the Appendix C material* has already been
> around the WG with no objection. The material from Appendix C is my
> own rewrite that (hopefully) matches the material that Larry posted
> last week about HTTP and RFC 1521.

That is not quite accurate -- the changes to section 3.6 were not
discussed prior to this message.

> Just as a brief (but pointed) reminder, this draft is supposed to say
> approximately what is common to HTTP/1.0 software. Please limit your
> comments to what is true, not what we would have wanted to be true. If
> there is little new discussion, it can go to a final I-D in the middle of
> next week, which would be nice for all of us.

Yes, please.

> --Paul Hoffman
> --Internet Mail Consortium
> 
> *****The following paragraph is added after the first paragraph of section 1.1 (Purpose):*****
> 
> This specification reflects the approximate state of those features which
> are found in most HTTP/1.0 implementations. The specification is split into
> two sections. Those features of HTTP for which implementations are usually
> consistent are described in the main body of this document. Those features
> which have few implementations or inconsistent ones are listed in Appendix
> D.

I still object to "the approximate state of" -- it isn't needed and is not
accurate.  Also, this should be a replacement for the last two sentences of 
the first paragragh of section 1.1, not a separate paragraph.

> *****Section 1.4 is added:*****
> 
> 1.4 HTTP and MIME
> 
> HTTP/1.0 uses many of the constructs defined for MIME, as defined in RFC
> 1521. Appendix C describes the ways in which the context of HTTP allows for
> different use of Internet Media Types than in typical transportation in
> email network, and gives the rationale for those differences.

Okay by me, except "than in typical transportation in email network" should
be "than is typically found in Internet mail".

> *****Section 3.6 and its subsections are changed to:*****
> 
> 3.6  Media Types
> 
> HTTP uses Internet Media Types [13] in the Content-Type header field
> (Section 10.5) in order to provide open and extensible data typing.
> 
>    media-type     = type "/" subtype *( ";" parameter )
>    type           = token
>    subtype        = token
> 
> Parameters may follow the type/subtype in the form of
> attribute/value pairs.
> 
>    parameter      = attribute "=" value
>    attribute      = token
>    value          = token | quoted-string
> 
> The type, subtype, and parameter attribute names are
> case-insensitive. Parameter values may or may not be
> case-sensitive, depending on the semantics of the parameter name.
> LWS must not be generated between the type and subtype, nor between
> an attribute and its value.

We should also add:

                               Recipients must ignore any media type
parameters whose names they do not recognize.

> Some older HTTP applications do not recognize media type parameters.
> HTTP/1.0 applications should only use media type parameters when they
> are necessary to define the content of a message.
> 
> Media-type values are registered with the Internet Assigned Number
> Authority (IANA). The media type registration process is outlined in
> RFC 1590 [13]. Use of non-registered media types is discouraged.
> 
> 3.6.1 Canonicalization and Text Defaults
> 
> Internet media types are registered with a canonical form.  In
> general, Entity-Bodies transferred via HTTP must be represented in
> the appropriate canonical form prior to the application of
> Content-Encoding, if any, and transmission.
> 
> Media types of "text/*" use CRLF as the text line break when in canonical
> form. However, HTTP allows the transport of text media not only in the
> canonical form with CRLF line breaks, but also with CR or LF alone, used
> consistently within the Entity-Body. This flexibility only applies to
> Entity-Bodies and not HTTP or multipart headers.

The above two paragraphs are not acceptable changes -- what was deleted
is important to how HTTP works.  The following is better:

   Internet media types are registered with a canonical form.  In general,
   entity bodies transferred via HTTP should be represented in the appropriate
   canonical form prior to transmission. If the body has been encoded
   via a Content-Encoding, the underlying data should be in canonical form
   prior to being encoded.

   Media subtypes of the "text" type use CRLF as the text line break when
   in canonical form.  However, HTTP allows the transport of text media 
   with plain CR or LF alone representing a line break when used consistently
   within the Entity-Body. HTTP applications must accept CRLF, plain CR, and
   plain LF as being representative of a line break in text media received
   via HTTP.  In addition, if the text media is represented in a character
   set which does not use octets 13 and 10 for CR and LF respectively, as
   is the case for some multi-byte character sets, HTTP allows the use
   of whatever octet sequences are defined by that character set to
   represent the equivalent of CR and LF for line breaks.  It is
   assumed that any recipient capable of using such a character set
   will know the appropriate octet sequence for representing line
   breaks within that character set.  This flexibility regarding line
   breaks applies only to text media in the Entity-Body; a bare CR or LF
   should not be substituted for CRLF within any of the HTTP control
   structures (such as header fields and multipart boundaries).

> Media types of "text/*" are defined to have a default charset parameter of
> "US-ASCII", and that other charset parameters should be labelled. In
> practice, HTTP servers frequently send text data without a charset
> parameter, and expect clients to guess the character set of the result.
> This has caused a great deal of confusion and lack of interoperability in
> HTTP 1.0 clients and servers.

This is incorrect and not representative of current practice OR recommended
practice.  The following is:

   The "charset" parameter is used with some media types to define the
   character set (Section 3.4) of the data.  When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   subtype are defined to have a default charset value of "ISO-8859-1"
   when received via HTTP.  

      Note: Some HTTP user agents provide a configuration option to
      allow the user to change the default interpretation of the media
      type character set when no charset parameter is given.  However,
      use of such options is not consistent and leads to poor
      interoperability across open systems.

   It is recommended that the character set of an entity be
   labelled as the lowest common denominator of the character codes
   used within that entity, with the exception that no label is
   preferred over the labels US-ASCII or ISO-8859-1.

> 3.6.2 Multipart Types
> 
> MIME provides for a number of "multipart" types -- encapsulations of
> several entities within a single message's Entity-Body. The multipart types
> registered by IANA [15] do not have any special meaning for HTTP, though

that should be "HTTP/1.0"

> user agents may need to understand each type in order to correctly
> interpret the purpose of each body-part. An HTTP user agent should follow
> the same or similar behavior as a MIME user agent does upon receipt of a
> multipart type. HTTP servers should not assume that all HTTP clients are
> prepared to handle multipart types.
> 
> All multipart types share a common syntax and must include a boundary
> parameter as part of the media type value. The message body is itself a
> protocol element and must therefore use only CRLF to represent line breaks
> between body-parts. Multipart body-parts may contain HTTP header fields
> which are significant to the meaning of that part.
> 
> *****Section 10.12 is changed to:*****
> 
> 10.12  MIME-Version
> 
> Some HTTP/1.0 applications send a MIME-Version field in the following
> format:
> 
>    MIME-Version   = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
> 
> However, this field has not been well-defined should be ignored.

Argh, no, that is absolutely wrong.  There is nothing wrong with the
definition -- it was just used without regard for its purpose in MIME.
Actually, it should be moved to appendix D and listed as:

D.X  MIME-Version

   HTTP messages may include a single MIME-Version general-header
   field to indicate what version of the MIME protocol was used to
   construct the message. Use of the MIME-Version header field, as
   defined by RFC 1521 [5], should indicate that the message is
   MIME-conformant.  Unfortunately, some older HTTP/1.0 servers send
   it indiscriminately, and thus this field should be ignored.

> *****Section 12.5 is added:*****
> 
> 12.5  Attacks Based On File and Path Names
> 
> Implementations of the HTTP servers should be careful to restrict the

That should be "Implementors of HTTP origin servers should ..."

> documents returned by HTTP requests to be only those that were intended
> by the administrators. If an HTTP server translates HTTP URIs directly
> into file system calls, the server must take special care not to serve
> files that were not intended to be delivered to HTTP clients. For
> example, Unix, Microsoft Windows, and other operating systems use ".."

Ummm, unless we want to include the TM disclaimer, that should be
"example, some operating systems". [It is okay by me to include the disclaimer]

> as a path component to indicate a directory level above the current one.
> On such a system, an HTTP server must disallow any such construct in the
> Request-URI if it would otherwise allow access to a resource outside
> those intended to be accessible via the HTTP server. Similarly, files
> intended for reference only internally to the server (such as access
> control files, configuration files, and script code) must be protected
> from inappropriate retrieval, since they might contain sensitive
> information. Experience has shown that minor bugs in such HTTP server
> implementations have turned into security risks.
> 
> *****Appendix C is changed to:*****
> 
> C.  Relationship to MIME
> 
> HTTP/1.0 uses many of the constructs defined for Internet Mail (RFC 822
> [7]) and the Multipurpose Internet Mail Extensions (RFC 1521, MIME [5]) to
> allow entities to be transmitted in an open variety of representations and
> with extensible mechanisms. However, RFC 1521 discusses email, and HTTP has
> a few features that are different than those described in RFC 1521. These
> differences were carefully chosen to optimize performance over 8-bit
> networks, to give greatest freedom for creating new media-types, to make
> date comparisons easier, and to acknowledge the practice of some early HTTP
> servers and clients.
> 
> At the time this document was written, it is expected that RFC 1521 will be
> revised. The revisions may include some of the practices found in HTTP/1.0
> but not in RFC 1521.
> 
> This appendix describes specific areas where HTTP differs from RFC 1521.
> Proxies and gateways to strict MIME environments should be aware of these
> differences and provide the appropriate conversions where necessary.
> Proxies and gateways from MIME environments to HTTP also need to be aware
> of the differences because some conversions may be required.

Looks good so far.

> C.1 Canonical Form and Line Breaks
> 
> RFC 1521 requires that an email entity be converted to canonical form prior
> to being transferred, as described in Appendix G of RFC 1521 [5]. Section
> 3.6.1 of this document describes the forms allowed for "text/*" media types
> when transmitted over HTTP.

"... allowed for subtypes of the "text" media type ..."

> RFC 1521 requires that content that has the primary media type "text"
> represent line breaks as CRLF and forbids the use of CR or LF outside of
> line break sequences. HTTP allows CRLF, bare CR, and bare LF to indicate a
> line break within text content when a message is transmitted over HTTP.
> 
> Where it is possible, a proxy or gateway from HTTP to a strict RFC 1521
> environment protocol should translate all line breaks within the text media
> types described in section 3.6.1 of this document to the RFC 1521 canonical
> form of CRLF. Note, however, that this may be complicated by the presence
> of HTTP content encoding and by the fact that HTTP allows the use of some
> character sets which do not use octets 13 and 10 to represent CR and LF, as
> is the case for some multi-byte character sets. If HTTP-to-MIME
> canonicalization is performed, the value of a Content-Length header field
> of the HTTP data must be updated to reflect the new body length.
> 
> C.2  Conversion of Date Formats
> 
> HTTP/1.0 uses a small set of date formats to simplify the process of date
> comparison; these are described in section 3.3 of this document. RFC 1521
> allows a larger set of date formats. Proxies and gateways from other
> protocols to HTTP should ensure that any Date header field present in a
> message conforms to one of the HTTP/1.0 formats and rewrite the date if
> necessary.
> 
> C.3  Introduction of Content-Encoding
> 
> RFC 1521 does not include any concept equivalent to HTTP/1.0's
> Content-Encoding header field. Since this acts as a modifier on the media
> type, proxies and gateways from HTTP to MIME-compliant protocols must
> either change the value of the Content-Type header field or decode the
> Entity-Body before forwarding the message. (Some experimental applications
> of Content-Type for Internet mail have used a media-type parameter of
> ";conversions=<content-coding>" to perform an equivalent function as
> Content-Encoding. However, this parameter is not part of RFC 1521.)
> 
> C.4  No Content-Transfer-Encoding
> 
> HTTP/1.0 does not use the Content-Transfer-Encoding (CTE) field of RFC
> 1521. Proxies and gateways from MIME-compliant protocols to HTTP must
> remove any non-identity CTE ("quoted-printable" or "base64") encoding prior
> to delivering the response message to an HTTP client.
> 
> Proxies and gateways from HTTP to MIME-compliant protocols are responsible
> for ensuring that the message is in the correct format and encoding for
> safe transport on that protocol. "Safe transport" is defined by the
> limitations of the protocol being used. At a minimum, the CTE field of
> 
> Content-Transfer-Encoding: binary
> 
> should be added by the HTTP-to-MIME proxy or gateway if the gateway is
> unwilling to apply a content transfer encoding.
> 
> An HTTP client may include a Content-Transfer-Encoding as an extension
> Entity-Header in a POST request when it knows the destination of that
> request is a proxy or gateway to a MIME-compliant protocol.

We can delete the above paragraph.  Although some software does it, there
is no way to define what CTE means within the HTTP encoding model without
including all of CTE in HTTP, so such messages are invalid anyway.

> C.5  HTTP Header Fields in Multipart Body-Parts
> 
> In RFC 1521, the header fields in multipart body-parts are generally
> ignored

... unless the field-name begins with "Content-".

> In HTTP/1.0, multipart body-parts may contain HTTP header fields
> which are significant to the meaning of that part.
> 
> *****Appendix D is added:*****
> 
> D.  Additional Features
> 
> This appendix documents features which were was not strong consensus in
> the IETF HTTP Working Group, or for which there were not a sufficient
> number of interoperable implementations. In some cases, there was strong
> consensus that the feature was needed but disagreement about how it
> should be implemented. In other cases, there was no general agreement on
> the feature.  Implementors who add the features in the Appendix should be
> aware that software using these features are less likely to be
> interoperable than software using the features from the main part of
> this specification.

I still don't like this -- it isn't why we are doing it (it may eventually
be so, but consensus has not been an issue).  It should be

  This appendix documents protocol elements used by some existing HTTP
  implementations, but not consistently and correctly across most HTTP/1.0
  applications.  Implementors should be aware of these features, but cannot
  rely upon their presence in, or interoperability with, other HTTP/1.0
  applications.

> The specifications in this section are shorter than they were in earlier
> drafts of the HTTP 1.0 specification. Some implementions of the features
> in this appendix are based on fuller descriptions of the features.

Delete that -- it isn't appropriate for the RFC.

> D.1 Additional Request Methods
> 
> D.1.1 PUT
> 
> The PUT method requests that the enclosed entity be stored under the
> supplied Request-URI. If the Request-URI refers to an already existing
> resource, the enclosed entity should be considered as a modified version
> of the one residing on the origin server. If the Request-URI does not
> point to an existing resource, and that URI is capable of being defined
> as a new resource by the requesting user agent, the origin server can
> create the resource with that URI.
> 
> The fundamental difference between the POST and PUT requests is reflected
> in the different meaning of the Request-URI. The URI in a POST request
> identifies the resource that will handle the enclosed entity as data to be
> processed. That resource may be a data-accepting process, a gateway to some
> other protocol, or a separate entity that accepts annotations. In contrast,
> the URI in a PUT request identifies the entity enclosed with the request --
> the user agent knows what URI is intended and the server must not attempt
> to apply the request to some other resource.
> 
> D.1.2 DELETE
> 
> The DELETE method requests that the origin server delete the 
> resource identified by the Request-URI.
> 
> D.1.3 LINK
> 
> The LINK method establishes one or more Link relationships between 
> the existing resource identified by the Request-URI and other 
> existing resources.
> 
> D.1.4 UNLINK
> 
> The UNLINK method removes one or more Link relationships from the 
> existing resource identified by the Request-URI.
> 
> D.2  Additional Header Field Definitions
> 
> This section defines the syntax and semantics of all standard 
> HTTP/1.0 header fields. For Entity-Header fields, both sender and 
> recipient refer to either the client or the server, depending on 
> who sends and who receives the entity.

Oooh, bad cut-n-paste there -- just delete the above paragraph.

> D.2.1  Accept
> 
> The Accept [request] header field can be used to indicate a list of media 
              ^^^^^^^
> ranges which are acceptable as a response to the request. The 
> asterisk "*" character is used to group media types into ranges, 
> with "*/*" indicating all media types and "type/*" indicating all 
> subtypes of that type. The set of ranges given by the client should 
> represent what types are acceptable given the context of the 
> request.
> 
> D.2.2  Accept-Charset
> 
> The Accept-Charset request header field can be used to indicate a 
> list of preferred character set encodings other than the default
> US-ASCII and ISO-8859-1. This field allows clients capable of 
> understanding more comprehensive or special-purpose character set 
> encodings to signal that capability to a server which is capable of 
> representing documents in those character set encodings.
> 
> D.2.3  Accept-Encoding
> 
> The Accept-Encoding request header field is similar to Accept, but 
> restricts the encoding-mechanism values which are acceptable in the 
> response.

encoding-mechanism -> content-coding

> D.2.4  Accept-Language
> 
> The Accept-Language request header field is similar to Accept, but 
> restricts the set of natural languages that are preferred as a 
> response to the request.
> 
> D.2.5  Content-Language
> 
> The Content-Language [entity header] field describes the natural
                        ^^^^^^^^^^^^^
> language(s) of the 
> intended audience for the enclosed entity. Note that this may not 
> be equivalent to all the languages used within the entity.
> 
> D.2.6  Link
> 
> The Link [entity] header [field]
            ^^^^^^          ^^^^^
> provides a means for describing a relationship 
> between the entity and some other resource. An entity may include 
> multiple Link values. Links at the metainformation level typically 
> indicate relationships like hierarchical structure and navigation 
> paths.
> 
> D.2.7  Retry-After
> 
> The Retry-After response header field can be used with a 503 
> (service unavailable) response to indicate how long the service is 
> expected to be unavailable to the requesting client. The value of 
> this field can be either an HTTP-date or an integer number of 
> seconds (in decimal) after the time of the response.
> 
> D.2.8  Title
> 
> The Title [entity] header field indicates the title of the entity.
             ^^^^^^
> D.2.9  URI
> 
> The URI-header [entity] field may contain some or all of the Uniform 
                  ^^^^^^
> Resource Identifiers (Section 3.2) by which the Request-URI 
> resource can be identified. There is no guarantee that the resource 
> can be accessed using the URI(s) specified. 

That's all,

 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/
Received on Thursday, 8 February 1996 09:16:52 UTC