- From: Paul Hoffman <paulh@imc.org>
- Date: Thu, 25 Jan 1996 20:09:05 -0800
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
The following is what I hope is the last round on the HTTP/1.0 draft. Everything here *except for the Appendix C material* has already been around the WG with no objection. The material from Appendix C is my own rewrite that (hopefully) matches the material that Larry posted last week about HTTP and RFC 1521. Just as a brief (but pointed) reminder, this draft is supposed to say approximately what is common to HTTP/1.0 software. Please limit your comments to what is true, not what we would have wanted to be true. If there is little new discussion, it can go to a final I-D in the middle of next week, which would be nice for all of us. --Paul Hoffman --Internet Mail Consortium *****The following paragraph is added after the first paragraph of section 1.1 (Purpose):***** This specification reflects the approximate state of those features which are found in most HTTP/1.0 implementations. The specification is split into two sections. Those features of HTTP for which implementations are usually consistent are described in the main body of this document. Those features which have few implementations or inconsistent ones are listed in Appendix D. *****Section 1.4 is added:***** 1.4 HTTP and MIME HTTP/1.0 uses many of the constructs defined for MIME, as defined in RFC 1521. Appendix C describes the ways in which the context of HTTP allows for different use of Internet Media Types than in typical transportation in email network, and gives the rationale for those differences. *****Section 3.6 and its subsections are changed to:***** 3.6 Media Types HTTP uses Internet Media Types [13] in the Content-Type header field (Section 10.5) in order to provide open and extensible data typing. media-type = type "/" subtype *( ";" parameter ) type = token subtype = token Parameters may follow the type/subtype in the form of attribute/value pairs. parameter = attribute "=" value attribute = token value = token | quoted-string The type, subtype, and parameter attribute names are case-insensitive. Parameter values may or may not be case-sensitive, depending on the semantics of the parameter name. LWS must not be generated between the type and subtype, nor between an attribute and its value. Some older HTTP applications do not recognize media type parameters. HTTP/1.0 applications should only use media type parameters when they are necessary to define the content of a message. Media-type values are registered with the Internet Assigned Number Authority (IANA). The media type registration process is outlined in RFC 1590 [13]. Use of non-registered media types is discouraged. 3.6.1 Canonicalization and Text Defaults Internet media types are registered with a canonical form. In general, Entity-Bodies transferred via HTTP must be represented in the appropriate canonical form prior to the application of Content-Encoding, if any, and transmission. Media types of "text/*" use CRLF as the text line break when in canonical form. However, HTTP allows the transport of text media not only in the canonical form with CRLF line breaks, but also with CR or LF alone, used consistently within the Entity-Body. This flexibility only applies to Entity-Bodies and not HTTP or multipart headers. Media types of "text/*" are defined to have a default charset parameter of "US-ASCII", and that other charset parameters should be labelled. In practice, HTTP servers frequently send text data without a charset parameter, and expect clients to guess the character set of the result. This has caused a great deal of confusion and lack of interoperability in HTTP 1.0 clients and servers. 3.6.2 Multipart Types MIME provides for a number of "multipart" types -- encapsulations of several entities within a single message's Entity-Body. The multipart types registered by IANA [15] do not have any special meaning for HTTP, though user agents may need to understand each type in order to correctly interpret the purpose of each body-part. An HTTP user agent should follow the same or similar behavior as a MIME user agent does upon receipt of a multipart type. HTTP servers should not assume that all HTTP clients are prepared to handle multipart types. All multipart types share a common syntax and must include a boundary parameter as part of the media type value. The message body is itself a protocol element and must therefore use only CRLF to represent line breaks between body-parts. Multipart body-parts may contain HTTP header fields which are significant to the meaning of that part. *****Section 10.12 is changed to:***** 10.12 MIME-Version Some HTTP/1.0 applications send a MIME-Version field in the following format: MIME-Version = "MIME-Version" ":" 1*DIGIT "." 1*DIGIT However, this field has not been well-defined should be ignored. *****Section 12.5 is added:***** 12.5 Attacks Based On File and Path Names Implementations of the HTTP servers should be careful to restrict the documents returned by HTTP requests to be only those that were intended by the administrators. If an HTTP server translates HTTP URIs directly into file system calls, the server must take special care not to serve files that were not intended to be delivered to HTTP clients. For example, Unix, Microsoft Windows, and other operating systems use ".." as a path component to indicate a directory level above the current one. On such a system, an HTTP server must disallow any such construct in the Request-URI if it would otherwise allow access to a resource outside those intended to be accessible via the HTTP server. Similarly, files intended for reference only internally to the server (such as access control files, configuration files, and script code) must be protected from inappropriate retrieval, since they might contain sensitive information. Experience has shown that minor bugs in such HTTP server implementations have turned into security risks. *****Appendix C is changed to:***** C. Relationship to MIME HTTP/1.0 uses many of the constructs defined for Internet Mail (RFC 822 [7]) and the Multipurpose Internet Mail Extensions (RFC 1521, MIME [5]) to allow entities to be transmitted in an open variety of representations and with extensible mechanisms. However, RFC 1521 discusses email, and HTTP has a few features that are different than those described in RFC 1521. These differences were carefully chosen to optimize performance over 8-bit networks, to give greatest freedom for creating new media-types, to make date comparisons easier, and to acknowledge the practice of some early HTTP servers and clients. At the time this document was written, it is expected that RFC 1521 will be revised. The revisions may include some of the practices found in HTTP/1.0 but not in RFC 1521. This appendix describes specific areas where HTTP differs from RFC 1521. Proxies and gateways to strict MIME environments should be aware of these differences and provide the appropriate conversions where necessary. Proxies and gateways from MIME environments to HTTP also need to be aware of the differences because some conversions may be required. C.1 Canonical Form and Line Breaks RFC 1521 requires that an email entity be converted to canonical form prior to being transferred, as described in Appendix G of RFC 1521 [5]. Section 3.6.1 of this document describes the forms allowed for "text/*" media types when transmitted over HTTP. RFC 1521 requires that content that has the primary media type "text" represent line breaks as CRLF and forbids the use of CR or LF outside of line break sequences. HTTP allows CRLF, bare CR, and bare LF to indicate a line break within text content when a message is transmitted over HTTP. Where it is possible, a proxy or gateway from HTTP to a strict RFC 1521 environment protocol should translate all line breaks within the text media types described in section 3.6.1 of this document to the RFC 1521 canonical form of CRLF. Note, however, that this may be complicated by the presence of HTTP content encoding and by the fact that HTTP allows the use of some character sets which do not use octets 13 and 10 to represent CR and LF, as is the case for some multi-byte character sets. If HTTP-to-MIME canonicalization is performed, the value of a Content-Length header field of the HTTP data must be updated to reflect the new body length. C.2 Conversion of Date Formats HTTP/1.0 uses a small set of date formats to simplify the process of date comparison; these are described in section 3.3 of this document. RFC 1521 allows a larger set of date formats. Proxies and gateways from other protocols to HTTP should ensure that any Date header field present in a message conforms to one of the HTTP/1.0 formats and rewrite the date if necessary. C.3 Introduction of Content-Encoding RFC 1521 does not include any concept equivalent to HTTP/1.0's Content-Encoding header field. Since this acts as a modifier on the media type, proxies and gateways from HTTP to MIME-compliant protocols must either change the value of the Content-Type header field or decode the Entity-Body before forwarding the message. (Some experimental applications of Content-Type for Internet mail have used a media-type parameter of ";conversions=<content-coding>" to perform an equivalent function as Content-Encoding. However, this parameter is not part of RFC 1521.) C.4 No Content-Transfer-Encoding HTTP/1.0 does not use the Content-Transfer-Encoding (CTE) field of RFC 1521. Proxies and gateways from MIME-compliant protocols to HTTP must remove any non-identity CTE ("quoted-printable" or "base64") encoding prior to delivering the response message to an HTTP client. Proxies and gateways from HTTP to MIME-compliant protocols are responsible for ensuring that the message is in the correct format and encoding for safe transport on that protocol. "Safe transport" is defined by the limitations of the protocol being used. At a minimum, the CTE field of Content-Transfer-Encoding: binary should be added by the HTTP-to-MIME proxy or gateway if the gateway is unwilling to apply a content transfer encoding. An HTTP client may include a Content-Transfer-Encoding as an extension Entity-Header in a POST request when it knows the destination of that request is a proxy or gateway to a MIME-compliant protocol. C.5 HTTP Header Fields in Multipart Body-Parts In RFC 1521, the header fields in multipart body-parts are generally ignored. In HTTP/1.0, multipart body-parts may contain HTTP header fields which are significant to the meaning of that part. *****Appendix D is added:***** D. Additional Features This appendix documents features which were was not strong consensus in the IETF HTTP Working Group, or for which there were not a sufficient number of interoperable implementations. In some cases, there was strong consensus that the feature was needed but disagreement about how it should be implemented. In other cases, there was no general agreement on the feature. Implementors who add the features in the Appendix should be aware that software using these features are less likely to be interoperable than software using the features from the main part of this specification. The specifications in this section are shorter than they were in earlier drafts of the HTTP 1.0 specification. Some implementions of the features in this appendix are based on fuller descriptions of the features. D.1 Additional Request Methods D.1.1 PUT The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity should be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI. The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity as data to be processed. That resource may be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server must not attempt to apply the request to some other resource. D.1.2 DELETE The DELETE method requests that the origin server delete the resource identified by the Request-URI. D.1.3 LINK The LINK method establishes one or more Link relationships between the existing resource identified by the Request-URI and other existing resources. D.1.4 UNLINK The UNLINK method removes one or more Link relationships from the existing resource identified by the Request-URI. D.2 Additional Header Field Definitions This section defines the syntax and semantics of all standard HTTP/1.0 header fields. For Entity-Header fields, both sender and recipient refer to either the client or the server, depending on who sends and who receives the entity. D.2.1 Accept The Accept header field can be used to indicate a list of media ranges which are acceptable as a response to the request. The asterisk "*" character is used to group media types into ranges, with "*/*" indicating all media types and "type/*" indicating all subtypes of that type. The set of ranges given by the client should represent what types are acceptable given the context of the request. D.2.2 Accept-Charset The Accept-Charset request header field can be used to indicate a list of preferred character set encodings other than the default US-ASCII and ISO-8859-1. This field allows clients capable of understanding more comprehensive or special-purpose character set encodings to signal that capability to a server which is capable of representing documents in those character set encodings. D.2.3 Accept-Encoding The Accept-Encoding request header field is similar to Accept, but restricts the encoding-mechanism values which are acceptable in the response. D.2.4 Accept-Language The Accept-Language request header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. D.2.5 Content-Language The Content-Language field describes the natural language(s) of the intended audience for the enclosed entity. Note that this may not be equivalent to all the languages used within the entity. D.2.6 Link The Link header provides a means for describing a relationship between the entity and some other resource. An entity may include multiple Link values. Links at the metainformation level typically indicate relationships like hierarchical structure and navigation paths. D.2.7 Retry-After The Retry-After response header field can be used with a 503 (service unavailable) response to indicate how long the service is expected to be unavailable to the requesting client. The value of this field can be either an HTTP-date or an integer number of seconds (in decimal) after the time of the response. D.2.8 Title The Title header field indicates the title of the entity. D.2.9 URI The URI-header field may contain some or all of the Uniform Resource Identifiers (Section 3.2) by which the Request-URI resource can be identified. There is no guarantee that the resource can be accessed using the URI(s) specified.
Received on Thursday, 25 January 1996 20:10:32 UTC